7. Search can be smarter.
location search history query security context
Personal, contextual, relevant results: consumer-
like simplicity and power in the enterprise.
8. Product Offering
Environment
Features
Support Level
Additional Support
Availability
Response Time
Number of Incidents
Pricing Model
Solr
Enterprise
24x7
SLA-Backed
Unlimited Incidents
Per Node
Dev Support (4 Contacts)
Operational Support
Regular Health Checks
Security
Log Analysis / SiLK Support
Dashboards & Reporting
Enhanced Admin UI
Fusion
Dev Support (4 Contacts)
Operational Support
Regular Health Checks
24x7
SLA-Backed
Unlimited Incidents
Per Node
Security
Crawlers & Connectors
Log Analysis / SiLK Support
Enhanced Admin UI
Data Enrichment
Machine Learning
Recommendations
Advanced Relevancy Tuning
Developer
Support
How-To Support
Knowledge Base
Fusion Support
9x5
SLA-Backed
Unlimited Incidents
Per Named Developer
ProductionDevelopment
9. • Get Started
• Dig in
• Go Big
• Get Finished
• Sneak peak
Inside Apache Solr 5
10. • Easy to start/stop
./bin/solr {start|stop}
• Create collections:
./bin/solr create -c <COLL_NAME>
• No more WAR! Web container (Jetty) is now an
implementation detail
• Scripts to support installing and running Solr as a
service on Linux.
Get Started
11. JSON’s great:
• Solr 5 “does the right thing” for JSON out of the box
Except when it isn’t:
• Most data isn’t JSON
• Solr handles CSV, XML, Rich Content out of the box
without having to install plugins
Your Content, Your Way
12. Your Content, Your Way
• Solr 5 will ship Tika 1.7, adding:
• OCR support
• PST and Matlab
• Better Date Handling
• More flexibility with spatial units
14. • Stats and Pivot faceting now work
together
• Focused on accuracy of results
• First few steps in unification of all
facet types with stats and
aggregations
• http://lucidworks.com/blog/you-
got-stats-in-my-facets/
Pivots and Stats
15. • Schema API: REST API for adding field types, and
dynamic fields
• Managing Request Handlers through API
• Implicit registration of replication, Real Time Get
and Administration Handlers
• Improved APIs for managing collections
API Goodness
16. Lucene 5 Highlights
• Stronger index safety guarantees
• Reduced memory usage in a number of areas
• No more FieldCache (replaced w/
UninvertingReader)
• Multi-valued sorting and suggesters
• Better IO defaults when using SSDs
• More efficient handling of merging stored fields
17. Go Big
• Many scaling improvements focused on interactions with
Zookeeper:
• Split cluster state management reduces chattiness in
large multi-tenant implementations
• Improved performance for Overseer operations >40%
• Better timeout defaults based on real-world testing
• See my Lucene Revolution Keynote for more details:
http://bit.ly/shalinRevKeynote
18. Distributed IDF
• IDF = Inverse Document Frequency = A measure of the
relative importance of a word in a collection
• 4 implementations:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across
requests
19. • Ease of getting started means
nothing if you can’t stay
running in production
• Jepsen tests simulate network
partitions, data loss, i.e. “The
Real World”
• https://github.com/
LucidWorks/jepsen/tree/solr-
jepsen
• http://bit.ly/solr-jepsen
Get Finished
20. Stability Improvements
• Protection of ZK content
• ReplicationHandler now has an option to throttle the
speed of replication
• More control over terminating long running queries
• Finite default timeouts for select and update requests
22. • Facets and Analytics:
• Mix and match all facet types and stats (SOLR-6352,
SOLR-6353, SOLR-4212)
• Percentiles via t-digest (SOLR-6350)
• Replication performance (SOLR-6816)
• Finish off Config APIs (various)
• Data location aware ValueSource implementation for fast
changing distributed data
• First class support for more languages OOTB
Near Term Road Map