As data sets continue to grow, search remains a key technology for many applications. But what is the current state of the enterprise search market? Which providers are gaining market share, and what are the latest developments and innovations? Based on experience from dozens of recent search projects using a range of technologies, this presentation will summarize market conditions, discuss current best practices for creating great search systems, and suggest some future trends to watch out for.
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
The Enterprise Search Market in a Nutshell
1. 1
The Enterprise Search Market in a
Nutshell
Iain Fletcher
ifletcher@searchtechnologies.com
October 19, 2015
ICIC 2015, Nice
2. 2
Agenda
• About Search Technologies (30 seconds)
• The enterprise search market
• Likely future architectures for supporting
important search applications
3. 3
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington
(HQ)
Frankfurt DE
• Founded 2005
• 180 employees
• 600+ customers
• Independent consulting company
• Focus on enterprise search
• Working will all leading platforms
Prague, CZ
6. 6
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,
Oracle/Endeca
2. Stand-alone specialists, often deployed to address specific apps or
challenges
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: Lucene, Solr, Elasticsearch
– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)
7. 7
The dominant market share is currently with
SharePoint, open source, and the GSA
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells – and a lot of GSAs have been
shipped during the past few years
Market Observations
8. 8
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing / converging technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing, and query processing
– Coveo, Attivio, Sinequa etc. have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA are delivered with limited content
processing functionality and limited connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
9. 9
Further Observations
• The search engines with less focus on peripheral issues
such as content processing and connectivity have dominant
market share
• Connectivity is often challenging, especially when
combined with continual data growth, and document-level
security requirements
• The movement of data sets to the cloud adds further
complexity for enterprise search systems
– Hybrid indexing environments will be with us for some years
– Some content sets in the cloud, some behind the firewall
10. 10
Great Search requires Attention to Detail
E.g. in content processing
prior to indexing
• Normalization
– Names, dates, synonyms….
• Entity identification and resolution
• Categorization
• Document vector extraction
• Document splitting and concatenation
• Link & popularity analysis
• Dupe & near-dupe detection
Index
security
category
metadata
11. 11
Future Directions for Search
So what will search architectures look like in the future?
Important influences:
• The business need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and evolution in
repository / storage fashions
12. 12
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, as evangelized by IBM,
Cloudera, etc.
2. Recent Search Architectures
Background Info
14. 14
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
15. 15
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• As data volumes grow, re-indexing
becomes challenging
• The rate at which content can be
acquired from repositories is usually the
bottleneck
Designed for Unstructured Content
16. 16
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a
month
RE-INDEX
17. 17
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
Index
Employee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
18. 18
The Future Architecture?
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
• This environment will encourage ever more sophisticated text analytics
• We expect to see much innovation in text analytics during the next few years
• The deliverable is a better, and richer search index
19. 19
An Established Architecture
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Secure
Cache
Iterative
Development
• Google.com works something like this,
since 2004
20. 20
An Integrated Search/Analytics Architecture
Hadoop
Content
Sources
Connectors
CMS
File system
Rapid Indexing
Content
Processing
Secure
Cache
Iterative
Development
ETL
Data
Sources
Data
Warehouse
Logfiles
Etc.
Etc. Search
App.
Search
App.
Analysis
App.
Analysis
App.
• Encourages agile exploitation of data and content resources
21. 21
Summary 1
• Search and Big Data applications are tending towards to the
same architecture
• Autonomous connectivity and content processing simplifies
and de-risks – if you can get it right
• The foundation of great search is still a clean, rich and
detailed index
• The “search index” itself is a mature technology, almost a
commodity
• Much of the innovation during the next few years will be in
text analytics, and other methods of preparing content
prior to indexing
22. 22
The compulsory analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched
content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
23. 23
The Enterprise Search Market in a
Nutshell
Iain Fletcher
ifletcher@Searchtechnologies.com
October 20, 2015
Questions?
26. 26
Where is the Focus?
• The Business View
• The Implementation View
ApplicationContent Capture
& Preparation
Data
Store
/ Index
Application
Content Capture
& Preparation
Data Store
/ Index