Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark

2016
OCTOBER 11-14 
BOSTON, MA
http://lucenerevolution.org

Lucidworks Fusion 2.3
Preview
Grant Ingersoll
@gsingers
CTO, Lucidworks

Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content

Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations & 
Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.

Fusion Architecture
RESTAPI
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS(Optional)
Shared Conﬁg
Mgmt
Leader
Election
Load
Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View

What’s New?
http://www.lucidworks.com/products/fusion

• General Improvements
• Index Pipeline Previews
• Better Time Series Indexing
• Spark goodness
Agenda

• System:
• Improved Javascript Stage performance
• Updated Versions for: Solr (5.4.1), Tika (1.12), Spark (1.6.1)
• Security:
• SAML-based security support
• API password-redaction capabilities
• Connectors:
• Box now supports JWT authentication, for easier setup
• Azure now supports incremental crawling
• HDFS and Windows Shares now support Kerberos authentication
• Additional controls for Github crawling
General Improvements

• Sample your data source and preview documents
without indexing
• Build and test custom pipelines without affecting the
original deﬁnitions
• Copy, save, merge pipelines upon completion
Enhanced Data Modeling via Index Pipeline Previews

• Greatly simplify the care and feeding of
time-based indexes
• Point and click creation of time series
shards
• Total control over number of shards and
replication
• Easily deﬁned retention and archiving
policies (e.g. 30 day retention)
• Intelligent query parsing optimizes shard
access
• Ideal for log data and signals
Time Series Done Right

• User Interface designed for quickly getting
started with Fusion and easy customization
• Popular features are pre-conﬁgured
• Built on AngularJS and Apache-licensed open
source
• Built in templates for viewing a variety of data
sources
• Learn more: https://lucidworks.com/products/
view/
• Fork on Github: https://github.com/lucidworks/
lucidworks-view
Lucidworks View

Demo
Index Preview
Time Series
Lucidworks View

• Improved Spark streaming and data locality
integration resulting in signiﬁcant performance
improvements
• $FUSION_HOME/bin/spark-shell available for rapid
prototyping and testing of Spark in the Fusion
environment using the command line
• Check out: http://github.com/lucidworks/spark-solr
Spark FTW

• Support for new Spark Job types:
• Aggregations, Script, Item Similarity, Quality
• Spark Job API now available at “/spark/jobs”
• Create and run your own Spark jobs
• Leverage best in class libraries like MLLib, Mahout
and DL4J
Fusion: Creating Jobs for Engineers Since 2015

• Spark has very basic text handling capabilities built-in
(whitespace tokenization and a few others)
• Lucene has a fast, capable text analysis system built-
in, hence:
• We’ve made Lucene Analyzers work nicely in Spark!
• Learn more at:
• https://lucidworks.com/blog/2016/04/13/spark-
solr-lucenetextanalyzer/
• https://github.com/lucidworks/spark-solr/blob/
master/src/main/scala/com/lucidworks/spark/
analysis/LuceneTextAnalyzer.scala
Lucene + Spark: Getting Past the Whitespace

• Fusion can now capture and calculate
common search metrics like:
• Mean Reciprocal Rank
• Precision/Recall
• NDCG (Normalized Discounted
Cumulative Gain)
• Uses the same framework as signals and
aggregations, meaning you can easily track
and report across time
Speaking of Quality…

Demo
Spark Shell, run k-Means, index clusters:
https://github.com/lucidworks/fusion-examples/tree/master/fusion-2.3-webinar/src/main/spark-shell

• Next Release will be 3.0 (June/July timeframe)
• Java 8 and above
• Solr 6.x
• Query Pipeline Builder
• Enhanced Machine Learning capabilities
• Preview in 2.3, but marked experimental
• Full featured Experiment Management framework with
support for multi-arm bandit optimization
• Easy import/export for moving from Dev -> QA -> Staging
-> Production
Looking Ahead

• Fusion 2.3 will be available week of April 25th
• Learn more about Fusion at: http://www.lucidworks.com/products/fusion
• Learn more about Lucidworks View: https://lucidworks.com/products/view/
• Fusion docs available at http://docs.lucidworks.com
Questions?

Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark

Similar to Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark