SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
Is Your IndexReader Really
  Atomic or Maybe Slow?
         Uwe Schindler
    SD DataSolutions GmbH,
 uschindler@sd-datasolutions.de

                                  1
My Background
• I am committer and PMC member of Apache Lucene and Solr. My
  main focus is on development of Lucene Java.
• Implemented fast numerical search and maintaining the new
  attribute-based text analysis API. Well known as Generics and
  Sophisticated Backwards Compatibility Policeman.
• Working as consultant and software architect for SD DataSolutions
  GmbH in Bremen, Germany. The main task is maintaining PANGAEA
  (Publishing Network for Geoscientific & Environmental Data) where
  I implemented the portal's geo-spatial retrieval functions with
  Apache Lucene Core.
• Talks about Lucene at various international conferences like the
  previous Lucene Revolution, Lucene Eurocon, ApacheCon EU/US,
  Berlin Buzzwords, and various local meetups.

                                                                      2
Agenda

• Motivation / History of Lucene

• AtomicReader & CompositeReader

• Reader contexts

• Wrap up

                                   3
Lucene Index Structure
• Lucene was the first full text search engine
  that supported document additions and
  updates
• Snapshot isolation ensures consistency


          Segmented index structure
  Committing changes creates new segments
                                                 4
Segments in Lucene




                      • Each index consists of various segments
                        placed in the index directory. All documents
                        are added to new new segment files, merged with other
                        on-disk files after flushing.
Image: Doug Cutting




                                   *) The term “optimal” does not mean all indexes must be optimized!
                                                                                                  5
Segments in Lucene




                      • Each index consists of various segments
                        placed in the index directory. All documents
                        are added to new new segment files, merged with other
                        on-disk files after flushing.
                      • Lucene writes segments incrementally and can merge
                        them.
Image: Doug Cutting




                                   *) The term “optimal” does not mean all indexes must be optimized!
                                                                                                  6
Segments in Lucene




                      • Each index consists of various segments
                        placed in the index directory. All documents
                        are added to new new segment files, merged with other
                        on-disk files after flushing.
                      • Lucene writes segments incrementally and can merge
                        them.
Image: Doug Cutting




                      • Optimized* index consists of one segment.
                                   *) The term “optimal” does not mean all indexes must be optimized!
                                                                                                  7
Lucene merges while indexing
   all of English Wikipedia




                                    Video by courtesy of:
                                                    8
                Mike McCandless’ blog, http://goo.gl/kI53f
Indexes in Lucene (up to version 3.6)
• Each segment (“atomic index”) is a completely
  functional index:
  – SegmentReader implements the IndexReader
    interface for single segments
• Composite indexes
  – DirectoryReader implements the IndexReader
    interface on top of a set of SegmentReaders
  – MultiReader is an abstraction of multiple
    IndexReaders combined to one virtual index

                                                  9
Composite Indexes (up to version 3.6)
Atomic “views” on multi-segment index:
  – Term Dictionary: on-the-fly merged & sorted term
    index (priority queue for TermEnum,…)
  – Postings: posting lists for each term appended,
    convert document IDs to be global
  – Metadata: doc frequency, term frequency,…
  – Stored fields, term vectors, deletions: delegate global
    document IDs -> segment document IDs (binary
    search)
  – FieldCache: duplicate instances for single segments
    and composite view (memory!!!)
                                                          10
Merging Term Index and Postings
                         Term 1

Term 1   Term 1          Term 2

Term 2   Term 3          Term 3

Term 4   Term 5          Term 4

Term 7   Term 6          Term 5

Term 8   Term 8          Term 6

Term 9   Term 9          Term 7
                         Term 8
                         Term 9


                                   11
Merging Term Index and Postings
                         Term 1

Term 1   Term 1          Term 2

Term 2   Term 3          Term 3

Term 4   Term 5          Term 4

Term 7   Term 6          Term 5

Term 8   Term 8          Term 6

Term 9   Term 9          Term 7
                         Term 8
                         Term 9


                                   12
Composite Indexes (up to version 3.6)
Atomic “views” on multi-segment index:
  – Term Dictionary: on-the-fly merged & sorted term
    index (priority queue for TermEnum,…)
  – Postings: posting lists for each term appended,
    convert document IDs to be global
  – Metadata: doc frequency, term frequency,…
  – Stored fields, term vectors, deletions: delegate
    global document IDs -> segment document IDs
    (binary search)
  – FieldCache: duplicate instances for single
    segments and composite view (memory!!!)
                                                   13
Searching before Version 2.9
• IndexSearcher used the underlying index
  always as a “single” (atomic) index:
  – Queries are executed on the atomic view of a
    composite index
  – Slowdown for queries that scan term dictionary
    (MultiTermQuery) or hit lots of documents,
    facetting




                                                          Image: © The Walt Disney Company
    => recommendation to “optimize” index
  – On every index change, FieldCache used for
    sorting had to be reloaded completely

                                                     14
Lucene 2.9 and later:
             Per segment search

Search is executed separately on each index segment:




             Atomic view no longer used!
                                                       15
Per Segment Search: Pros
• No live term dictionary merging
• Possibility to parallelize
   – ExecutorService in IndexSearcher since Lucene 3.1+
   – Do not optimize to make this work!

• Sorting only needs per-segment FieldCache
   – Cheap reopen after index changes!

• Filter cache per segment
   – Cheap reopen after index changes!
                                                          16
Per Segment Search: Cons
• Query/Filter API changes:
  – Scorer / Filter’s DocIdSet no longer use global
    document IDs
• Slower sorting by string terms
  – Term ords are only comparable inside each
    segment
  – String comparisons needed after segment
    traversal
  – Use numeric sorting if possible!!!
    (Lucene supports missing values since version 3.4 [buggy], corrected 3.5+)

                                                                                 17
Agenda

• Motivation / History of Lucene

• AtomicReader & CompositeReader

• Reader contexts

• Wrap up

                                   18
Composite Indexes (up to version 3.6)
Atomic “views” on multi-segment index:
  – Term Dictionary: on-the-fly merged & sorted term
    index (priority queue for TermEnum,…)
  – Postings: posting lists for each term appended,
    convert document IDs to be global
  – Metadata: doc frequency, term frequency,…
  – Stored fields, term vectors, deletions: delegate global
    document IDs -> segment document IDs (binary
    search)
  – FieldCache: duplicate instances for single segments
    and composite view (memory!!!)
                                                          19
Composite Indexes (version 4.0)
Atomic “views” on multi-segment index:
  – Term Dictionary: on-the-fly merged & sorted term
    index (priority queue for TermEnum,…)
  – Postings: posting lists for each term appended,
    convert document IDs to be global
  – Metadata: doc frequency, term frequency,…
  – Stored fields, term vectors, deletions: delegate global
    document IDs -> segment document IDs (binary
    search)
  – FieldCache: duplicate instances for single segments
    and composite view (memory!!!)
                                                          20
Early Lucene 4.0
                                   • Only “historic” IndexReader interface
                                     available since Lucene 1.0
                                   • 80% of all methods of composite IndexReaders
                                     throwed UnsupportedOperationException
                                     – This affected all user-facing APIs
                                       (SegmentReader was hidden / marked experimental)
                                   • No compile time safety!
Image: © The Walt Disney Company




                                     – Query Scorers and Filters need term dictionary and
                                       postings, throwing UOE when executed on composite
                                       reader


                                                                                          21
Image: © The Walt Disney Company




                                   Heavy Committing™ !!!




 22
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)




                                                               23
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!




                                                               24
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!

•   Access to stored fields (and term vectors) by document ID
     – to display / highlight search results only!




                                                                25
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!

•   Access to stored fields (and term vectors) by document ID
     – to display / highlight search results only!

•   Some very limited metadata
     – number of documents,…




                                                                26
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!

•   Access to stored fields (and term vectors) by document ID
     – to display / highlight search results only!

•   Some very limited metadata
     – number of documents,…

•   Unmodifiable: No more deletions / changing of norms possible!
     – Applies to all readers in Lucene 4.0 !!!




                                                                    27
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!

•   Access to stored fields (and term vectors) by document ID
     – to display / highlight search results only!

•   Some very limited metadata
     – number of documents,…

•   Unmodifiable: No more deletions / changing of norms possible!
     – Applies to all readers in Lucene 4.0 !!!

•   Passed to IndexSearcher
     – just as before!

                                                                    28
IndexReader
•   Most generic (abstract) API to an index
     – superclass of more specific types
     – cannot be subclassed directly (no public constructor)

•   Does not know anything about “inverted index” concept
     – no terms, no postings!

•   Access to stored fields (and term vectors) by document ID
     – to display / highlight search results only!

•   Some very limited metadata
     – number of documents,…

•   Unmodifiable: No more deletions / changing of norms possible!
     – Applies to all readers in Lucene 4.0 !!!

•   Passed to IndexSearcher
     – just as before!

•   Allows IndexReader.open() for backwards compatibility (deprecated)   29
AtomicReader
• Inherits from IndexReader




                              30
AtomicReader
• Inherits from IndexReader
• Access to “atomic” indexes (single segments)




                                                 31
AtomicReader
• Inherits from IndexReader
• Access to “atomic” indexes (single segments)
• Full term dictionary and postings API




                                                 32
AtomicReader
• Inherits from IndexReader
• Access to “atomic” indexes (single segments)
• Full term dictionary and postings API




                                                 33
AtomicReader
• Inherits from IndexReader
• Access to “atomic” indexes (single segments)
• Full term dictionary and postings API
• Access to DocValues (new in Lucene 4.0) and
  norms



                                                 34
CompositeReader
• No additional functionality on top of
  IndexReader




                                          35
CompositeReader
• No additional functionality on top of
  IndexReader
• Provides getSequentialSubReaders() to
  retrieve all child readers




                                          36
CompositeReader
• No additional functionality on top of
  IndexReader
• Provides getSequentialSubReaders() to
  retrieve all child readers
• DirectoryReader and MultiReader implement
  this class



                                          37
DirectoryReader
• Abstract class, defines interface for:




                                           38
DirectoryReader
• Abstract class, defines interface for:
   – access to on-disk indexes (on top of Directory class)
   – access to commit points, index metadata, index
     version, isCurrent() for reopen support
   – defines abstract openIfChanged (for cheap reopening
     of indexes)
   – child readers are always AtomicReader instances




                                                         39
DirectoryReader
• Abstract class, defines interface for:
   – access to on-disk indexes (on top of Directory class)
   – access to commit points, index metadata, index version,
     isCurrent() for reopen support
   – defines abstract openIfChanged (for cheap reopening of
     indexes)
   – child readers are always AtomicReader instances
• Provides static factory methods for opening indexes
   – well-known from IndexReader in Lucene 1 to 3
   – factories return internal DirectoryReader implementation
     (StandardDirectoryReader with SegmentReaders as childs)

                                                               40
Basic Search Example
Image: © The Walt Disney Company




                                       Looks familiar, doesn’t it?

                                                                     41
“Arrgh! I still need terms and postings on my
DirectoryReader!!! What should I do? Optimize
to only have one segment? Help me please!!!”
          (example question on java-user@lucene.apache.org)




                                                              42
What to do?

• Calm down
• Take a break and drink a beer*
• Don’t optimize / force merge your index!!!




     *) Robert’s solution                      43
Solutions
• Most efficient way:
  – Retrieve atomic leaves from your composite:
    reader.getTopReaderContext().leaves()
  – Iterate over sub-readers, do the work
    (possibly parallelized)
  – Merge results




                                                  44
Solutions
• Most efficient way:
  – Retrieve atomic leaves from your composite:
    reader.getTopReaderContext().leaves()
  – Iterate over sub-readers, do the work
    (possibly parallelized)
  – Merge results


• Otherwise: wrap your CompositeReader:

                                                  45
(Slow) Solution

AtomicReader
  SlowCompositeReaderWrapper.wrap(IndexReader r)


• Wraps IndexReaders of any kind as atomic reader,
  providing terms, postings, deletions, doc values
   – Internally uses same algorithms like previous Lucene readers
   – Segment-merging uses this to merge segments, too




                                                                    46
(Slow) Solution

AtomicReader
  SlowCompositeReaderWrapper.wrap(IndexReader r)


• Wraps IndexReaders of any kind as atomic reader,
  providing terms, postings, deletions, doc values
   – Internally uses same algorithms like previous Lucene readers
   – Segment-merging uses this to merge segments, too


• Solr always provides an AtomicReader for convenience
  through SolrIndexSearcher. Plugin writers should use:
AtomicReader
  rd = mySolrIndexSearcher.getAtomicReader()
                                                                    47
Other readers
• FilterAtomicReader
  – Was FilterIndexReader in 3.x
    (but now solely works on atomic readers)
  – Allows to filter terms, postings, deletions
  – Useful for index splitters (e.g., PKIndexSplitter,
    MultiPassIndexSplitter)
    (provide own getLiveDocs() method, merge to IndexWriter)

• ParallelAtomicReader, -CompositeReader


                                                               48
IndexReader Reopening
• Reopening solely provided by directory-based
  DirectoryReader instances




                                             49
IndexReader Reopening
• Reopening solely provided by directory-based
  DirectoryReader instances
• No more reopen for:
  – AtomicReader: they are atomic, no refresh possible
  – MultiReader: reopen child readers separately, create
    new MultiReader on top of reopened readers
  – Parallel*Reader, FilterAtomicReader: reopen
    wrapped readers, create new wrapper afterwards


                                                           50
Agenda

• Motivation / History of Lucene

• AtomicReader & CompositeReader

• Reader contexts

• Wrap up

                                   51
IndexReaderContext
AtomicReaderContext
CompositeReaderContext


WTF ?!?
                         52
The Problem
              SuperComposite: MultiReader


    CompositeA:                     CompositeB:
   DirectoryReader                DirectoryReader
Atomic0   Atomic1   Atomic2    Atomic0   Atomic1   Atomic2

Doc0      Doc0      Doc0        Doc0     Doc0      Doc0
Doc1      Doc1      Doc1        Doc1     Doc1      Doc1
Doc2      Doc2      Doc2        Doc2     Doc2      Doc2
Doc3      Doc3      Doc3        Doc3     Doc3      Doc3




                                                             53
The Problem
              SuperComposite: MultiReader


    CompositeA:                     CompositeB:
   DirectoryReader                DirectoryReader
Atomic0   Atomic1   Atomic2    Atomic0   Atomic1   Atomic2

Doc0      Doc4      Doc8        Doc0     Doc0      Doc0
Doc1      Doc5      Doc9        Doc1     Doc1      Doc1
Doc2      Doc6      Doc10       Doc2     Doc2      Doc2
Doc3      Doc7      Doc11       Doc3     Doc3      Doc3




                                                             54
The Problem
              SuperComposite: MultiReader


    CompositeA:                     CompositeB:
   DirectoryReader                DirectoryReader
Atomic0   Atomic1   Atomic2    Atomic0   Atomic1   Atomic2

Doc0      Doc4      Doc8        Doc12    Doc16     Doc20

Doc1      Doc5      Doc9        Doc13    Doc17     Doc21

Doc2      Doc6      Doc10       Doc14    Doc18     Doc22

Doc3      Doc7      Doc11       Doc15    Doc19     Doc23




                                                             55
Solution
• Each IndexReader instance provides:
  IndexReaderContext getTopReaderContext()




                                             56
Solution
• Each IndexReader instance provides:
  IndexReaderContext getTopReaderContext()

• The context provides a “view” on all childs (direct
  descendants) and leaves (down to lowest level
  AtomicReaders)




                                                        57
Solution
• Each IndexReader instance provides:
  IndexReaderContext getTopReaderContext()

• The context provides a “view” on all childs (direct
  descendants) and leaves (down to lowest level
  AtomicReaders)
• Each atomic leave has a docBase (document ID
  offset)




                                                    58
Solution
• Each IndexReader instance provides:
  IndexReaderContext getTopReaderContext()

• The context provides a “view” on all childs (direct
  descendants) and leaves (down to lowest level
  AtomicReaders)
• Each atomic leave has a docBase (document ID offset)
• IndexSearcher passes a context instance relative to its
  own top-level reader to each Query-Scorer / Filter
   – allows to access complete reader tree up to the current top-
     level reader
   – Allows to get the “global” document ID
                                                              59
Agenda

• Motivation / History of Lucene

• AtomicReader & CompositeReader

• Reader contexts

• Wrap up

                                   60
Summary


• Up to Lucene 3.6 indexes are still accessible on
  any hierarchy level with same interface
• Lucene 4.0 will split the IndexReader class into
  several abstract interfaces
• IndexReaderContexts will support per-segment
  search preserving top-level document IDs
                                                     61
Summary
• Lucene moved from global searches to per-
  segment search in Lucene 2.9
• Up to Lucene 3.6 indexes are still accessible on
  any hierarchy level with same interface
• Lucene 4.0 will split the IndexReader class into
  several abstract interfaces
• IndexReaderContexts will support per-segment
  search preserving top-level document IDs
                                                     62
Summary


• Up to Lucene 3.6 indexes are still accessible on
  any hierarchy level with same interface
• Lucene 4.0 will split the IndexReader class into
  several abstract interfaces
• IndexReaderContexts will support per-segment
  search preserving top-level document IDs
                                                     63
Summary




• Lucene 4.0 will split the IndexReader class into
  several abstract interfaces
• IndexReaderContexts will support per-segment
  search preserving top-level document IDs
                                                     64
Summary




• Lucene 4.0 will split the IndexReader class into
  several abstract interfaces
• IndexReaderContexts will support per-segment
  search preserving top-level document IDs
                                                     65
Contact
                  Uwe Schindler
                uschindler@apache.org
                http://www.thetaphi.de
                        @thetaph1




    SD DataSolutions GmbH
         Wätjenstr. 49
    28213 Bremen, Germany
      +49 421 40889785-0
http://www.sd-datasolutions.de

                                         66

Weitere ähnliche Inhalte

Was ist angesagt?

Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRed Hat Developers
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
A Kafka-based platform to process medical prescriptions of Germany’s health i...
A Kafka-based platform to process medical prescriptions of Germany’s health i...A Kafka-based platform to process medical prescriptions of Germany’s health i...
A Kafka-based platform to process medical prescriptions of Germany’s health i...HostedbyConfluent
 
PowerDNS-Admin vs DNS-UI
PowerDNS-Admin vs DNS-UIPowerDNS-Admin vs DNS-UI
PowerDNS-Admin vs DNS-UIbarbarousisk
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)Roger Zhou 周志强
 
Improve PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGateImprove PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGateBobby Curtis
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaDataWorks Summit
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 

Was ist angesagt? (20)

Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
A Kafka-based platform to process medical prescriptions of Germany’s health i...
A Kafka-based platform to process medical prescriptions of Germany’s health i...A Kafka-based platform to process medical prescriptions of Germany’s health i...
A Kafka-based platform to process medical prescriptions of Germany’s health i...
 
PowerDNS-Admin vs DNS-UI
PowerDNS-Admin vs DNS-UIPowerDNS-Admin vs DNS-UI
PowerDNS-Admin vs DNS-UI
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
Improve PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGateImprove PostgreSQL replication with Oracle GoldenGate
Improve PostgreSQL replication with Oracle GoldenGate
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 

Andere mochten auch

Lucene with Bloom filtered segments
Lucene with Bloom filtered segmentsLucene with Bloom filtered segments
Lucene with Bloom filtered segmentsMark Harwood
 
Patterns for large scale search
Patterns for large scale searchPatterns for large scale search
Patterns for large scale searchMark Harwood
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 

Andere mochten auch (6)

Lucene with Bloom filtered segments
Lucene with Bloom filtered segmentsLucene with Bloom filtered segments
Lucene with Bloom filtered segments
 
Patterns for large scale search
Patterns for large scale searchPatterns for large scale search
Patterns for large scale search
 
Lucene KV-Store
Lucene KV-StoreLucene KV-Store
Lucene KV-Store
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 

Ähnlich wie Is Your Index Reader Really Atomic or Maybe Slow?

Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialeckilucenerevolution
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch ArchitechtureAnurag Sharma
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteLucidworks
 
Illuminating Lucene.Net
Illuminating Lucene.NetIlluminating Lucene.Net
Illuminating Lucene.NetDean Thrasher
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Glusterfs session #9 index xlator
Glusterfs session #9   index xlatorGlusterfs session #9   index xlator
Glusterfs session #9 index xlatorPranith Karampuri
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbaseDipti Borkar
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Timothy St. Clair
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performanceintelliyole
 

Ähnlich wie Is Your Index Reader Really Atomic or Maybe Slow? (20)

Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialecki
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Flexible Indexing in Lucene 4.0
Flexible Indexing in Lucene 4.0Flexible Indexing in Lucene 4.0
Flexible Indexing in Lucene 4.0
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch Architechture
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
 
Illuminating Lucene.Net
Illuminating Lucene.NetIlluminating Lucene.Net
Illuminating Lucene.Net
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Elastic search
Elastic searchElastic search
Elastic search
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Glusterfs session #9 index xlator
Glusterfs session #9   index xlatorGlusterfs session #9   index xlator
Glusterfs session #9 index xlator
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
Lucene 101
Lucene 101Lucene 101
Lucene 101
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 

Kürzlich hochgeladen (20)

Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 

Is Your Index Reader Really Atomic or Maybe Slow?

  • 1. Is Your IndexReader Really Atomic or Maybe Slow? Uwe Schindler SD DataSolutions GmbH, uschindler@sd-datasolutions.de 1
  • 2. My Background • I am committer and PMC member of Apache Lucene and Solr. My main focus is on development of Lucene Java. • Implemented fast numerical search and maintaining the new attribute-based text analysis API. Well known as Generics and Sophisticated Backwards Compatibility Policeman. • Working as consultant and software architect for SD DataSolutions GmbH in Bremen, Germany. The main task is maintaining PANGAEA (Publishing Network for Geoscientific & Environmental Data) where I implemented the portal's geo-spatial retrieval functions with Apache Lucene Core. • Talks about Lucene at various international conferences like the previous Lucene Revolution, Lucene Eurocon, ApacheCon EU/US, Berlin Buzzwords, and various local meetups. 2
  • 3. Agenda • Motivation / History of Lucene • AtomicReader & CompositeReader • Reader contexts • Wrap up 3
  • 4. Lucene Index Structure • Lucene was the first full text search engine that supported document additions and updates • Snapshot isolation ensures consistency  Segmented index structure  Committing changes creates new segments 4
  • 5. Segments in Lucene • Each index consists of various segments placed in the index directory. All documents are added to new new segment files, merged with other on-disk files after flushing. Image: Doug Cutting *) The term “optimal” does not mean all indexes must be optimized! 5
  • 6. Segments in Lucene • Each index consists of various segments placed in the index directory. All documents are added to new new segment files, merged with other on-disk files after flushing. • Lucene writes segments incrementally and can merge them. Image: Doug Cutting *) The term “optimal” does not mean all indexes must be optimized! 6
  • 7. Segments in Lucene • Each index consists of various segments placed in the index directory. All documents are added to new new segment files, merged with other on-disk files after flushing. • Lucene writes segments incrementally and can merge them. Image: Doug Cutting • Optimized* index consists of one segment. *) The term “optimal” does not mean all indexes must be optimized! 7
  • 8. Lucene merges while indexing all of English Wikipedia Video by courtesy of: 8 Mike McCandless’ blog, http://goo.gl/kI53f
  • 9. Indexes in Lucene (up to version 3.6) • Each segment (“atomic index”) is a completely functional index: – SegmentReader implements the IndexReader interface for single segments • Composite indexes – DirectoryReader implements the IndexReader interface on top of a set of SegmentReaders – MultiReader is an abstraction of multiple IndexReaders combined to one virtual index 9
  • 10. Composite Indexes (up to version 3.6) Atomic “views” on multi-segment index: – Term Dictionary: on-the-fly merged & sorted term index (priority queue for TermEnum,…) – Postings: posting lists for each term appended, convert document IDs to be global – Metadata: doc frequency, term frequency,… – Stored fields, term vectors, deletions: delegate global document IDs -> segment document IDs (binary search) – FieldCache: duplicate instances for single segments and composite view (memory!!!) 10
  • 11. Merging Term Index and Postings Term 1 Term 1 Term 1 Term 2 Term 2 Term 3 Term 3 Term 4 Term 5 Term 4 Term 7 Term 6 Term 5 Term 8 Term 8 Term 6 Term 9 Term 9 Term 7 Term 8 Term 9 11
  • 12. Merging Term Index and Postings Term 1 Term 1 Term 1 Term 2 Term 2 Term 3 Term 3 Term 4 Term 5 Term 4 Term 7 Term 6 Term 5 Term 8 Term 8 Term 6 Term 9 Term 9 Term 7 Term 8 Term 9 12
  • 13. Composite Indexes (up to version 3.6) Atomic “views” on multi-segment index: – Term Dictionary: on-the-fly merged & sorted term index (priority queue for TermEnum,…) – Postings: posting lists for each term appended, convert document IDs to be global – Metadata: doc frequency, term frequency,… – Stored fields, term vectors, deletions: delegate global document IDs -> segment document IDs (binary search) – FieldCache: duplicate instances for single segments and composite view (memory!!!) 13
  • 14. Searching before Version 2.9 • IndexSearcher used the underlying index always as a “single” (atomic) index: – Queries are executed on the atomic view of a composite index – Slowdown for queries that scan term dictionary (MultiTermQuery) or hit lots of documents, facetting Image: © The Walt Disney Company => recommendation to “optimize” index – On every index change, FieldCache used for sorting had to be reloaded completely 14
  • 15. Lucene 2.9 and later: Per segment search Search is executed separately on each index segment: Atomic view no longer used! 15
  • 16. Per Segment Search: Pros • No live term dictionary merging • Possibility to parallelize – ExecutorService in IndexSearcher since Lucene 3.1+ – Do not optimize to make this work! • Sorting only needs per-segment FieldCache – Cheap reopen after index changes! • Filter cache per segment – Cheap reopen after index changes! 16
  • 17. Per Segment Search: Cons • Query/Filter API changes: – Scorer / Filter’s DocIdSet no longer use global document IDs • Slower sorting by string terms – Term ords are only comparable inside each segment – String comparisons needed after segment traversal – Use numeric sorting if possible!!! (Lucene supports missing values since version 3.4 [buggy], corrected 3.5+) 17
  • 18. Agenda • Motivation / History of Lucene • AtomicReader & CompositeReader • Reader contexts • Wrap up 18
  • 19. Composite Indexes (up to version 3.6) Atomic “views” on multi-segment index: – Term Dictionary: on-the-fly merged & sorted term index (priority queue for TermEnum,…) – Postings: posting lists for each term appended, convert document IDs to be global – Metadata: doc frequency, term frequency,… – Stored fields, term vectors, deletions: delegate global document IDs -> segment document IDs (binary search) – FieldCache: duplicate instances for single segments and composite view (memory!!!) 19
  • 20. Composite Indexes (version 4.0) Atomic “views” on multi-segment index: – Term Dictionary: on-the-fly merged & sorted term index (priority queue for TermEnum,…) – Postings: posting lists for each term appended, convert document IDs to be global – Metadata: doc frequency, term frequency,… – Stored fields, term vectors, deletions: delegate global document IDs -> segment document IDs (binary search) – FieldCache: duplicate instances for single segments and composite view (memory!!!) 20
  • 21. Early Lucene 4.0 • Only “historic” IndexReader interface available since Lucene 1.0 • 80% of all methods of composite IndexReaders throwed UnsupportedOperationException – This affected all user-facing APIs (SegmentReader was hidden / marked experimental) • No compile time safety! Image: © The Walt Disney Company – Query Scorers and Filters need term dictionary and postings, throwing UOE when executed on composite reader 21
  • 22. Image: © The Walt Disney Company Heavy Committing™ !!! 22
  • 23. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) 23
  • 24. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! 24
  • 25. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! • Access to stored fields (and term vectors) by document ID – to display / highlight search results only! 25
  • 26. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! • Access to stored fields (and term vectors) by document ID – to display / highlight search results only! • Some very limited metadata – number of documents,… 26
  • 27. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! • Access to stored fields (and term vectors) by document ID – to display / highlight search results only! • Some very limited metadata – number of documents,… • Unmodifiable: No more deletions / changing of norms possible! – Applies to all readers in Lucene 4.0 !!! 27
  • 28. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! • Access to stored fields (and term vectors) by document ID – to display / highlight search results only! • Some very limited metadata – number of documents,… • Unmodifiable: No more deletions / changing of norms possible! – Applies to all readers in Lucene 4.0 !!! • Passed to IndexSearcher – just as before! 28
  • 29. IndexReader • Most generic (abstract) API to an index – superclass of more specific types – cannot be subclassed directly (no public constructor) • Does not know anything about “inverted index” concept – no terms, no postings! • Access to stored fields (and term vectors) by document ID – to display / highlight search results only! • Some very limited metadata – number of documents,… • Unmodifiable: No more deletions / changing of norms possible! – Applies to all readers in Lucene 4.0 !!! • Passed to IndexSearcher – just as before! • Allows IndexReader.open() for backwards compatibility (deprecated) 29
  • 31. AtomicReader • Inherits from IndexReader • Access to “atomic” indexes (single segments) 31
  • 32. AtomicReader • Inherits from IndexReader • Access to “atomic” indexes (single segments) • Full term dictionary and postings API 32
  • 33. AtomicReader • Inherits from IndexReader • Access to “atomic” indexes (single segments) • Full term dictionary and postings API 33
  • 34. AtomicReader • Inherits from IndexReader • Access to “atomic” indexes (single segments) • Full term dictionary and postings API • Access to DocValues (new in Lucene 4.0) and norms 34
  • 35. CompositeReader • No additional functionality on top of IndexReader 35
  • 36. CompositeReader • No additional functionality on top of IndexReader • Provides getSequentialSubReaders() to retrieve all child readers 36
  • 37. CompositeReader • No additional functionality on top of IndexReader • Provides getSequentialSubReaders() to retrieve all child readers • DirectoryReader and MultiReader implement this class 37
  • 38. DirectoryReader • Abstract class, defines interface for: 38
  • 39. DirectoryReader • Abstract class, defines interface for: – access to on-disk indexes (on top of Directory class) – access to commit points, index metadata, index version, isCurrent() for reopen support – defines abstract openIfChanged (for cheap reopening of indexes) – child readers are always AtomicReader instances 39
  • 40. DirectoryReader • Abstract class, defines interface for: – access to on-disk indexes (on top of Directory class) – access to commit points, index metadata, index version, isCurrent() for reopen support – defines abstract openIfChanged (for cheap reopening of indexes) – child readers are always AtomicReader instances • Provides static factory methods for opening indexes – well-known from IndexReader in Lucene 1 to 3 – factories return internal DirectoryReader implementation (StandardDirectoryReader with SegmentReaders as childs) 40
  • 41. Basic Search Example Image: © The Walt Disney Company Looks familiar, doesn’t it? 41
  • 42. “Arrgh! I still need terms and postings on my DirectoryReader!!! What should I do? Optimize to only have one segment? Help me please!!!” (example question on java-user@lucene.apache.org) 42
  • 43. What to do? • Calm down • Take a break and drink a beer* • Don’t optimize / force merge your index!!! *) Robert’s solution 43
  • 44. Solutions • Most efficient way: – Retrieve atomic leaves from your composite: reader.getTopReaderContext().leaves() – Iterate over sub-readers, do the work (possibly parallelized) – Merge results 44
  • 45. Solutions • Most efficient way: – Retrieve atomic leaves from your composite: reader.getTopReaderContext().leaves() – Iterate over sub-readers, do the work (possibly parallelized) – Merge results • Otherwise: wrap your CompositeReader: 45
  • 46. (Slow) Solution AtomicReader SlowCompositeReaderWrapper.wrap(IndexReader r) • Wraps IndexReaders of any kind as atomic reader, providing terms, postings, deletions, doc values – Internally uses same algorithms like previous Lucene readers – Segment-merging uses this to merge segments, too 46
  • 47. (Slow) Solution AtomicReader SlowCompositeReaderWrapper.wrap(IndexReader r) • Wraps IndexReaders of any kind as atomic reader, providing terms, postings, deletions, doc values – Internally uses same algorithms like previous Lucene readers – Segment-merging uses this to merge segments, too • Solr always provides an AtomicReader for convenience through SolrIndexSearcher. Plugin writers should use: AtomicReader rd = mySolrIndexSearcher.getAtomicReader() 47
  • 48. Other readers • FilterAtomicReader – Was FilterIndexReader in 3.x (but now solely works on atomic readers) – Allows to filter terms, postings, deletions – Useful for index splitters (e.g., PKIndexSplitter, MultiPassIndexSplitter) (provide own getLiveDocs() method, merge to IndexWriter) • ParallelAtomicReader, -CompositeReader 48
  • 49. IndexReader Reopening • Reopening solely provided by directory-based DirectoryReader instances 49
  • 50. IndexReader Reopening • Reopening solely provided by directory-based DirectoryReader instances • No more reopen for: – AtomicReader: they are atomic, no refresh possible – MultiReader: reopen child readers separately, create new MultiReader on top of reopened readers – Parallel*Reader, FilterAtomicReader: reopen wrapped readers, create new wrapper afterwards 50
  • 51. Agenda • Motivation / History of Lucene • AtomicReader & CompositeReader • Reader contexts • Wrap up 51
  • 53. The Problem SuperComposite: MultiReader CompositeA: CompositeB: DirectoryReader DirectoryReader Atomic0 Atomic1 Atomic2 Atomic0 Atomic1 Atomic2 Doc0 Doc0 Doc0 Doc0 Doc0 Doc0 Doc1 Doc1 Doc1 Doc1 Doc1 Doc1 Doc2 Doc2 Doc2 Doc2 Doc2 Doc2 Doc3 Doc3 Doc3 Doc3 Doc3 Doc3 53
  • 54. The Problem SuperComposite: MultiReader CompositeA: CompositeB: DirectoryReader DirectoryReader Atomic0 Atomic1 Atomic2 Atomic0 Atomic1 Atomic2 Doc0 Doc4 Doc8 Doc0 Doc0 Doc0 Doc1 Doc5 Doc9 Doc1 Doc1 Doc1 Doc2 Doc6 Doc10 Doc2 Doc2 Doc2 Doc3 Doc7 Doc11 Doc3 Doc3 Doc3 54
  • 55. The Problem SuperComposite: MultiReader CompositeA: CompositeB: DirectoryReader DirectoryReader Atomic0 Atomic1 Atomic2 Atomic0 Atomic1 Atomic2 Doc0 Doc4 Doc8 Doc12 Doc16 Doc20 Doc1 Doc5 Doc9 Doc13 Doc17 Doc21 Doc2 Doc6 Doc10 Doc14 Doc18 Doc22 Doc3 Doc7 Doc11 Doc15 Doc19 Doc23 55
  • 56. Solution • Each IndexReader instance provides: IndexReaderContext getTopReaderContext() 56
  • 57. Solution • Each IndexReader instance provides: IndexReaderContext getTopReaderContext() • The context provides a “view” on all childs (direct descendants) and leaves (down to lowest level AtomicReaders) 57
  • 58. Solution • Each IndexReader instance provides: IndexReaderContext getTopReaderContext() • The context provides a “view” on all childs (direct descendants) and leaves (down to lowest level AtomicReaders) • Each atomic leave has a docBase (document ID offset) 58
  • 59. Solution • Each IndexReader instance provides: IndexReaderContext getTopReaderContext() • The context provides a “view” on all childs (direct descendants) and leaves (down to lowest level AtomicReaders) • Each atomic leave has a docBase (document ID offset) • IndexSearcher passes a context instance relative to its own top-level reader to each Query-Scorer / Filter – allows to access complete reader tree up to the current top- level reader – Allows to get the “global” document ID 59
  • 60. Agenda • Motivation / History of Lucene • AtomicReader & CompositeReader • Reader contexts • Wrap up 60
  • 61. Summary • Up to Lucene 3.6 indexes are still accessible on any hierarchy level with same interface • Lucene 4.0 will split the IndexReader class into several abstract interfaces • IndexReaderContexts will support per-segment search preserving top-level document IDs 61
  • 62. Summary • Lucene moved from global searches to per- segment search in Lucene 2.9 • Up to Lucene 3.6 indexes are still accessible on any hierarchy level with same interface • Lucene 4.0 will split the IndexReader class into several abstract interfaces • IndexReaderContexts will support per-segment search preserving top-level document IDs 62
  • 63. Summary • Up to Lucene 3.6 indexes are still accessible on any hierarchy level with same interface • Lucene 4.0 will split the IndexReader class into several abstract interfaces • IndexReaderContexts will support per-segment search preserving top-level document IDs 63
  • 64. Summary • Lucene 4.0 will split the IndexReader class into several abstract interfaces • IndexReaderContexts will support per-segment search preserving top-level document IDs 64
  • 65. Summary • Lucene 4.0 will split the IndexReader class into several abstract interfaces • IndexReaderContexts will support per-segment search preserving top-level document IDs 65
  • 66. Contact Uwe Schindler uschindler@apache.org http://www.thetaphi.de @thetaph1 SD DataSolutions GmbH Wätjenstr. 49 28213 Bremen, Germany +49 421 40889785-0 http://www.sd-datasolutions.de 66