1. 1
An Overview of the Enterprise Search
Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015
2. 2
Agenda
⢠A brief overview of the current enterprise search
market
⢠The convergence of search with analytics
disciplines
⢠Likely future architectures for search applications
4. 4
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
â E.g. SharePoint, HP Autonomy, IBM/Vivisimo, Dassault/Exalead
2. Stand-alone specialists, often bought to address specific apps
â E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
â Raw: E.g. Lucene, Solr, Elasticsearch
â With support/add-ons: E.g. LucidWorks, Cloudera Search, Elastic
4. Cloud-based services, typically based on open source technology
â E.g. Amazon Cloudsearch, MS Azure search
5. 5
The dominant market share is with SharePoint, open
source, and the Google Search Appliance
⢠SharePoint 2013 search is credible, and bundled
â Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
⢠Solr and Elasticsearch are robust and reliable
â Thanks to very wide-spread deployment
⢠The Google brand sells search â and a lot of GSAs have
been shipped during the past few years
Market Observations
6. 6
Functional Observations
⢠Core indexing / searching is generally fast and reliable
â Search is a maturing technology
⢠Key differences remain in peripheral functionality, such as
content processing prior to indexing. For example:
â Coveo, Attivio, Sinequa all have well-developed indexing
pipelines, UI tools, and a range of data connectors
â SharePoint and GSA have limited content processing
functionality and rely on 3rd parties for connectivity
â Solr, Elasticsearch, AWS Cloudsearch and Azure search donât
provide a formal indexing pipeline, UI, or connectors
7. 7
Further Observations
⢠The search engines with less focus on peripheral issues
(such as content processing and connectivity) have
dominant market share
⢠Connectivity remains challenging, especially when
combined with continual data growth
⢠The movement of data sets to the cloud adds further
complexity
â Hybrid indexing environments will be with us for some years
8. 8
Content Processing / Text Analysis Examples
⢠Normalization
â Names, dates, synonyms, spelling
⢠Entity identification and resolution
⢠Additional metadata from content analysis
⢠Categorization
⢠Document vector extraction
⢠Splitting and concatenation
⢠Dupe & near-dupe detection
⢠Link analysis
⢠Ingesting external signals
⢠Security enforcement and analysis
Index
security
category
metadata
9. 9
Future Directions
So what will search architectures look like in the future?
Important Influences:
⢠The need for organizational and analytical agility
⢠The convergence of search and (âbig dataâ) analytics
⢠Continual growth in data volumes, and churn in repository
/ storage fashions
10. 10
Converging Architectures
Letâs take a brief look at:
1. The âBig Data Architectureâ, evangelized by IBM,
Cloudera, etc.
2. Contemporary Search Architectures
Background Info
12. 12
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
13. 13
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
⢠A few documents-per-second?
⢠There are only 2.6 million seconds in a month
⢠If you change something significant in the index
pipeline, you will need to re-index
RE-INDEX
14. 14
A Better Search Architecture
⢠Re-indexing rates greatly improved
⢠âTouch-timeâ with repositories can be managed autonomously
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
Index
Employee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
15. 15
The Future Architecture?
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
⢠This environment will encourage ever more sophisticated content
processing
⢠We expect much innovation in text analytics during the next few years
⢠Driven by cheap, easily available processing power
⢠The deliverable is a richer search index
16. 16
The Future Architecture
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
⢠Google.com works something like this
for 10+ years
17. 17
An Integrated Search/Analytics Architecture
Hadoop
Content
Sources
Connectors
/ Crawlers
CMS
File system
Rapid, & ad hoc Indexing
Content
Processing
Staging
Repository
Iterative
Development
ETL
Data
Sources
Data
Warehouse
Logfiles
Etc.
OSINT Search
App.
Search
App.
Analysis
App.
Analysis
App.
⢠Encourages agile exploitation of data and content resources
18. 18
Summary
⢠Search and Analytics are tending towards to the same
architecture
⢠Autonomous connectivity and content processing systems
simplify and de-risk projects
⢠The âsearch indexâ is a mature technology, and becoming a
commodity
â Thanks to open source alternatives setting high standards
⢠The centre of attention is shifting from the index to the
content preparation
â This perhaps fits well with the profile of dominant market
leaders: SharePoint, GSA, Solr, ElasticsearchâŚ.
19. 19
Conclusion
⢠The foundation of great search and analytical applications
is a clean, rich and detailed index
⢠Much of the innovation during the next years will be in
content analytics
â The architecture discussed makes it easy to adopt new ideas
and products
â And it promotes agility, experimentation, and innovation
⢠In a data-driven world, agility is vital
20. 20
The analyst quoteâŚ.
And finallyâŚ.
âEnterprise Search Can Bring Big Data Within Reachâ
⢠Multiple, purpose-built indexes that are derived from enriched
content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
21. 21
An Overview of the Current Enterprise
Search Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015
Thank you!