4. • Best in breed search solution built on Apache Lucene and Solr
• Easily capture signals like clicks, shares, ratings, etc. and make them actionable
• Powerful data ingestion and analysis capabilities enabling machine learning,
recommendations and positive user feedback loops
• Effortless scale leveraging proven frameworks and algorithms
• Easy integration with big data tools like Hadoop
Fusion Foundations
5. Billions of Docs
Optional
REST
Security woven
throughout
Proxy/LB
Recs
Worker
Pipes Metrics
NLP Sched.
Blobs Msging
Connectors
Worker Cluster Mgr.
Spark
Shards Shards
Solr
HDFS
Shared Config
Mgmt
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Signals
Fusion Architecture
Millions of Users
6.
7. • Data exploration and visualization
• Easy Ingestion, feature selection and data reduction
• REST APIs for easy integration with commonly used tools
• Quick and Dirty: classification, clustering
• Powerful and scalable aggregations, math/stats framework leveraging Apache Spark
• Out of the box NLP tools for part of speech, sentence detection, named entity and more
• OOTB recommenders plus Mahout extensions
Fusion Data Science Use Cases
8. Lucene: Core search, pluggable ranking, advanced
storage, sparse matrix
Solr: Faceting, function queries, basic stats, scaling, easy
setup, UIMA, basic NLP, search clustering
Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI,
Spark integration, advanced stats, large scale
aggregations
Fusion: Standing on the shoulders
of giants.
10. • Ingestion
• 60+ connectors, plus easily push data in using REST APIs
• Feature Selection
• Analyzers for all types
• Easily get/calculate weights for terms and attach payloads
• Term Vectors/Term Dictionary
• Data Reduction
• Filters
• Analyzers
• Data quality tools
Ingestion, Selection, Reduction
11. • Math:
• Search is essentially Vector * Matrix
• Aggregations
• Enable advanced computation over both core content as well as Fusion’s signals
• Make it easy to try out by leveraging Solr
• Ship with prebuilt “named” aggregations to cover common scenarios
Aggregations and Math
12. • Effortless scale, integrated with Fusion and Solr
• Leverage existing libraries like:
• Mahout
• Deep Learning 4J
• GraphX, MLLib
• As easy as:
• bin/spark start
• http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true
Spark FTW!
14. • Fusion powers recommendation use cases such as:
• People who bough this, bought that
• Related searches, spellings and more
• Session analysis
• Fusion ships with several built in recommendation options
- Graph and collaborative filtering based approaches
• Easily enable multi-modal recommendations that combine:
- Content
- Collaborative Filtering
- Spatial
- Historic/Context
Recommendations
15. • Spark
• APIs for running non-Lucid Spark jobs
• Integration with 3rd party Spark instances (from major Hadoop distros)
• Solr RDD extensions for term dictionary, term vectors
• UI for managing Aggregations
• Full-fledged Graph API
• More Math: matrices, functions, etc.
What's Next