8. 7.X: Autoscaling
⢠7.4: - Periodic house-keeping task: cleans up inactive shardsâ¨
- Index size trigger: document count or size in bytes
⢠7.5: - Policy replica attribute: #ALL, #EQUAL, percentage,â¨
range, and floating point valuesâ¨
- Policy cores attribute: #EQUAL, percentage, â¨
range, and floating point valuesâ¨
- Percentage in freedisk policy attributeâ¨
- Simulation framework: test scaling up to 1 billion docs
9. 7.X: Cross Data Center Replication
⢠7.2: Support bi-directional syncing of CDCR clusters
This is not
active-active, â¨
but ratherâ¨
passive-active
or active-passive:
only one activeâ¨
cluster at a time.
10. 7.X: Time Routed Aliases
⢠7.3: - Specialization of Solrâs collection alias featureâ¨
- Support time series data, e.g. logs / sensor dataâ¨
- Maintain performance under continuous indexingâ¨
- CREATEALIAS: start, interval, retention policyâ¨
- Automatically create new collectionsâ¨
- Automatically delete old collections (optional)â¨
- Route updates based on timestampâ¨
- Search against all aliased collections*
⢠7.5: Preemptively create the next collection when updatesâ¨
are near the latest collectionâs end date (optional)â¨
* Pending optimization: minimize queried collections (SOLR-9562)
11. 7.X: Replica types
⢠7.0:â¨
â¨
â¨
â¨
â¨
â¨
â¨
⢠7.4: Query param to prioritize replicas by type, e.g.
shards.preference=replica.type:PULL,replica.type:TLOG
Type
Indexesâ¨
locally
Supportsâ¨
soft
commitâ¨
& RTG
Pulls
segments
from
leader
Writes toâ¨
TLog
Can
become
shard
leader
Queryable
NRT â â â â â
TLOG leader â â â â â
TLOG â â â â
PULL â â
12. 7.X: Streaming expressions
⢠Parallel computation function suite
⢠Some use cases: MapReduce, aggregations, parallel SQL, pub/
sub messaging, graph traversal, machine learning, statistical
programming
⢠Each 7.X release has addedâ¨
many new functions
⢠7.5: Ref guide:â¨
Math Expressions User Guide
13. 7.X: JSON Facet API
⢠7.0: Terms facets: added optional refinement support
⢠7.4: Semantic Knowledge Graph support via new â¨
relatedness() aggregate function
⢠Finds ad-hoc relationships by scoring documents
relative to foreground and background document
sets
⢠7.5: Heatmap facet support
15. 7.X: Text analysis / machine learning
⢠7.1: Bengali normalizer and stemmer
⢠7.2: Enable off-ZooKeeper storage of large (>1MB) LTR models
⢠7.3: OpenNLP integration: tokenization, POS tagging, phraseâ¨
chunking, lemmatization, NER, language detection
⢠7.4: - ProtectedTermFilterFactory: donât filter protected termsâ¨
- TaggerRequestHandler (a.k.a. SolrTextTagger): NER
⢠7.5: - "nori" Korean morphological text analysis: "*_txt_ko"â¨
- PhrasesIdentificationComponent: identify and scoreâ¨
candidate query phrases based on index statisticsâ¨
- UIMA integration removed
16. 7.X: Collections API
⢠7.3: Add collection level properties similar to cluster properties
⢠7.4: Cluster-wide defaults for numShards, nrtReplicas,â¨
tlogReplicas, pullReplicas
⢠7.5: - Support co-locating replicas of two or more collectionsâ¨
together in a node via the withCollection parameterâ¨
to the CREATE and MODIFYCOLLECTION commandsâ¨
- SPLITSHARD: New split method using hard links: splitMethod=link
⢠3-5 times faster than the original splitMethod=rewrite
⢠Slows down replication
⢠Increases disk usage on replica nodes
18. 7.X: Queries
⢠7.2: New synonymQueryStyle field type option: enableâ¨
generation of appropriate queries for hierarchicalâ¨
relations between overlapping terms
⢠as_same_term (default): SynonymQuery(bird,robin)
⢠pick_best: Dismax(bird,robin)
⢠as_distinct_terms: (bird OR robin)
⢠7.4: JSON query DSL: Enable query/filter tagging,â¨
e.g. { "#colorfilt" : "color:blue" } â¨
equivalent to local-param {!tag=colorfilt}color:blueâ¨
19. 7.X: Large index segment merging
⢠Problem: Overly large segments (e.g. as a result of force-â¨
merge/optimize) stop being eligible for merging,â¨
and can start accumulating >50% deletedâ¨
documents, wasting space and skewing index stats.
⢠7.5: - TieredMergePolicy now respects maxSegmentSizeMBâ¨
by default when executing force-merge/optimize andâ¨
expunge-deletesâ¨
- TieredMergePolicyâs reclaimDeletesWeight has beenâ¨
replaced with a new deletesPctAllowed setting toâ¨
control how aggressively deletes should be reclaimed
20. 7.X: Replication/recovery/rolling upgrades
⢠7.3: The old Leader-Initiated-Recovery (LIR) implementationâ¨
is deprecated and replaced
⢠To perform a rolling upgrade to Solr 8, you must be on
Solr 7.3 or higher
⢠7.4: - IndexFetcher now skips fetching identical filesâ¨
- Buffering updates are written to a separate TLogâ¨
- Parallel replay of buffering TLogs
21. 7.X: Block-join / nested documents
⢠7.3: Added filters and excludeTags local-params forâ¨
{!parent} and {!child} query parsers, usable forâ¨
multi-select faceting
⢠7.5: WIP: Allow Solr to more faithfully represent deeplyâ¨
nested document relationships, rather than requiringâ¨
reconstruction based on the flattened list of child docsâ¨
returned by Solr
22. 7.X: Miscellaneous
⢠7.3: add-distinct atomic updates
⢠7.4: - Ignore large document URPâ¨
- TLog: maxSize auto hard-commit settingâ¨
(in addition to maxDocs & maxTime)
⢠7.5: Custom cluster properties allowed with ext. prefix
24. 8.0: Autoscaling
⢠Suggestions API: rebalance options even if no violations
⢠Suggestions API: add-replica for lost replicas
⢠maxOps limit for index size trigger
⢠Autoscaling policy framework will be the default replica
placement strategy
25. 8.0: Index upgrades
⢠7.0: Lucene indexes record the major Lucene version thatâ¨
created the index, and the minimum Lucene versionâ¨
that contributed to segments.
⢠8.0: Version N-2 or older indexes will now fail to open,â¨
even if they have been merged into an N-1 index.
⢠IndexUpgrader will not upgrade 6.X or earlier indexes
⢠Re-indexing will be required to upgrade
26. 8.0: HTTP/2
⢠May 2018: Mark Miller announced his Star Burst effort:â¨
many cleanups and performance enhancements
⢠July 2018: Cao Manh Dat took up the HTTP/2 aspects: SOLR-12639
⢠Indexing test: 33M docs, 1 shard, 2 replicas (SOLR-12642)
⢠Garbage: Leader: 26% less; replica: 76% less
⢠Indexing throughput: 54% higher
⢠CPU time: Leader: 39% higher; replica: 76% lower
⢠Ready to merge back to master, pending release ofâ¨
Jetty 9.4.13, containing SPNEGO HTTP/2 implementation
27. 8.0: Miscellaneous
⢠Lucene: scores must be non-negative
⢠Function(Score)Query-s convert negative scores to zero
⢠TODO: remove deprecations
⢠Trie fields? Removal effectively blocked by:
⢠SOLR-12074: Add numeric equivalent to StrField
⢠SOLR-11127: Mechanism to migrate schema
for .system collection (a.k.a. blob store) schema from
Trie (pre-7.0) to Points (7.0+)
29. 8.X: Lucene/Solr minimum JDK
⢠Oracle will end free JDK 8 support in January 2019
⢠Both JDK 9 & 10 are already EOL, no more Oracle support
⢠JDK 11 will very likely be next minimum supported JDK, no
schedule yet
⢠Under JDK 9+, Solrâs Hadoop-related functionality has
problems, including with Kerberos
⢠Uwe Schindlerâs Jenkins server tests Lucene/Solr on Oracle
9+10+11+12 JDKs
⢠All have higher Solr test failure rates than on JDK 8
30. 8.X: Luke: UI framework & licensing
⢠Andrzej Bialecki: Initial implementation: Thinlet, GPL
⢠Mark Harwood: GWT
⢠Mark Miller: Apache Pivot
⢠Dmitry Kan and Tomoko Uchida took ownership on Github
⢠Tomoko Uchida: JavaFX (bundled w/JDK 8)
⢠LUCENE-2562: Make Luke a Lucene/Solr Module
⢠JavaFX/OpenJFX unbundled from Java 11 JDK, GPL+CPE
⢠Tomoko Uchida: Swing (7.5 release available)
31. 8.X: New Lucene features
⢠Index impacts, Block-Max WAND, similarity cleanups
⢠Some queries (especially term queries and disjunctions)
are much faster when number of hits is not required
⢠FeatureField: incorporate static relevance signals, e.g.
PageRank
⢠Soft deletes
⢠Merge policy retains deleted docs according to policy
⢠Enables document history, e.g. for time-travel indexes
⢠RAMDirectory replaced by ByteBuffersDirectory