2. Who am I?
• Anshum Gupta, Apache Lucene/Solr committer,
Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
6. Example is now Server
• No default collection1
• Configset options
• ant example server
• post.sh
7. Posting documents was never so easy!
• bin/post script wraps around the improved
SimplePostTool
• Index JSON directly OTB
• Developers: SolrServer is now SolrClient
9. Managing Solr Configuration - Application
• Paramsets: Add/Edit
• initParams: Generic appends, invariants and defaults
outside of the component
• Schema API: REST API for adding field types, and
dynamic fields
• Managing requestHandlers through API
• Implicit registration of replication, get and admin
Handlers.
10. Managing the cluster - Systems
• Collection APIs
• BALANCESHARDUNIQUE: Even distribution of custom replica properties
• Improved APIs
• Option to not shuffle nodeSet specified during CREATE Collection
• Logging
• Transaction log replay status
• Slow request (optional)
• Support for editing common solrconfig.xml values
• Scripts to support installing and running Solr as a service on Linux.
11. Keeping Solr Instance(s) Stable
• ReplicationHandler now has an option to throttle the speed of
replication
• timeAllowed respected more widely - Query expansion,
collection and LBHTTPSolrClient retries
• Finite default timeouts for select and update requests
13. • Splitting of ClusterState
• Every collection has its own cluster state
• No need to watch what everyone else is doing
• Might be the default in 5.0
• Improved Solr - Zk communication
• Speed up overseer operations avoiding cluster state
reads from zookeeper at the start of each loop
• Better default timeouts to operate at a large scale
16. Distributed IDF
• Multiple contributors and almost 5 years.
• 4 implementations OTB:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across requests
• Flow:
• Conditionally Send GET_TERM_STATS request to participating nodes
• Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS
• Conditional GET_FIELDS
17. Stats Component
• stats.field can now be used to generate stats over
the numeric results of arbitrary functions,
• stats.field={!func}product(price,popularity)
• Stats hang off pivots via tags
18. And there are more…
• DateRangeField for indexing date ranges, especially multi-valued ones.
• Spatial fields that used to require units=degrees now take
distanceUnits=degrees/kilometers miles instead.
• MoreLikeThis QueryParser: Works in SolrCloud mode too.
• API for managing blobs
19. and more…
• First class support in SolrJ for Collection API calls
• Upgrade Tika to 1.7: This adds support for parsing
Outlook PST and Matlab (MAT) files.
20. Maturity
• Jepsen tests
• More unit tests and more success
stories of Solr.
• Protection of ZK content
21. No more WAR!
• Solr is now an app, no more shipping a war starting
Solr 5.0
• Upgrade to Jetty 9 coming soon
• Will allow for a lot of things (SPDY) that wouldn’t be
possible if we had to support tomcat/netty/jetty
everything else.
23. Timeline*
• Release branch cut
• 2nd RC vote in progress.
• Vote - 3 days, 3 votes
• Artifacts propagation to ASF mirrors - 1 day
• Official release note - Right after!
* prospective and subject to how things go
24. Coming soon
• Collections API: REBALANCESHARDS
• Spatial 2D heat-map faceting
• Facet and analytics
• Replication performance
• More API goodness