This document discusses enterprise search solutions and describes Searchbox. It summarizes that Searchbox leverages the Apache Solr search technology and offers plugins, a search framework, and search-as-a-service. Enterprise search differs from web search in focusing on productivity within an organization using heterogeneous structured, semi-structured, and unstructured data from various sources. Searchbox aims to provide an iterative tool to derive value from big data by enabling information retrieval and mining of both big and small data sources.
3. WHAT IS SEARCHBOX?
Searchbox leverages Apache Solr Technology and:
Offers various Solr plugins
Offers a Search Framework which can be used to
develop custom search engines tied to business
needs
Search-as-a-Service (On the cloud)
4. ENTERPRISE SEARCH VS.
WEB SEARCH
Productivity
Heterogeneous data
False negative / False positive
Structured, semi-structured, unstructured data
7. FALSE NEGATIVES ARE A
KILLER
On the web it’s ok not to
find a specific document
Not an option within a
company
Real time indexing
Liability concerns
Compliance (Why this
result?)
10. WHAT IS BIG DATA?
Distributed & disparate data from several sources
Structured - semi structured - non structured
Big data & machine learning
Enhance existing unstructured data (tagging, entity
extraction, summarization)
Content curation
11. FROM BIG TO SMALL
DATA STACK
Scalable Backend infrastructure & archiving
Information Retrieval
Analysis / Discovery
Visualization
Sharepoint, Cassandra, Hadoop, Oracle, SAP, MangoDB, ...
Solr, Lucene, Elastic Search, Business Warehouse,
SAP BW, ...
Searchbox backend
Searchbox
frontend
Big Data
Small Data
12. OUR APPROACH TO
CONVERGENCE
- Index
- Crawl
- Fields
- Metadata
- Facets
- Filters
- More Like This
- Search Framework
- Presets
- Templating
- Tagging
- Summarization
- Sorting
Connect
Discover
Lift /
Enhance
Specialize
13. CONCLUSION
Working with Big data is expensive and time
consuming
Requires high level of expertise in multiple fields
(Networking, Programming, ML, NLP, Mathematics,
Statistics, ...)
Information Retrieval / mining can serve as an
iterative tool to leverage value from big data
14. SEARCHBOX FOR BIG
DATA
Data centric (Machine learning based enhancements)
Solr storage (Solr 4.x as scalable key-value store)
Hosted Solr Cluster with sharding and replication
Iterative process
Guided administration panel
Human friendly as opposed to CLI