Enterprise search aims to identify and enable content from multiple enterprise sources to be indexed, searched, and displayed. It faces challenges like unifying diverse data sources, identifying relevant information in real-time, and providing action-oriented insights. Machine learning techniques can help by automatically classifying and clustering data, extracting entities and sentiments, and personalizing search results. Case studies demonstrate how enterprise search has helped organizations in healthcare, telecommunications, finance, and sports improve productivity, customer service, and data-driven insights.
3. Foundational methodology
3
If we toss a coin 100 times and get heads
every time, what’s the probability of getting a
head on the 101st toss?
50% 99+%
Traditional probability Bayesian Inference
5. Silos Volume and Velocity Expectations
What is enterprise search and what are its challenges?
Challenges
Enterprise search is a means of
identifying and enabling content from
multiple enterprise-type sources to
be indexed, searched, and displayed
to a defined audience.
An effective enterprise search
platform should enable productivity.
5
6. Productivity depends on effective enterprise search
6
10%
50%
6 hours
50-80%
The Butler Group reports up to 10% of staff costs are lost because employees are
unable to find the right information to do their jobs. (2006)
In a study of over 1000 middle managers, Accenture found that managers spend up to
2 hours a day searching for information, and more than 50% of the information they
obtain has no value to them. (2007)
According to the New York Times, data scientists spend 50-80% of their time collecting
and prepping data. (2014)
An Aberdeen Group study of 188 organizations that had implemented enterprise
search revealed executives at the top performing companies within those examined
saved 6 hours a week looking for information, compared to 1 hour for executives at the
other companies. (2009)
8. The data landscape is radically changing
More connected people, apps, and things
generating more data in many forms
Business
data
Human
data
Machine
data
10x
faster
growth than
traditional
business
data
8
9. Why is processing human data different?
– Human Information is made up of ideas, is diverse, and has context
– Ideas don’t exactly match like data does; they have distance.
– Human Information is not static – it’s dynamic and lives everywhere.
9
MobileTextsEmailAudioVideoSocial Media
Transactional Data Documents Search Engine Images IT/OT
10. Enterprise Search: Let me Google that for you
Web Enterprise
10
Content Web pages; largely homogeneous
Variety of data sources; variety of file
formats; heterogeneous
Relevance Tolerates large number of results, as well
as duplicated or overlapping information
Demands small number of unique results
with high degree of specificity
Personalization Little personalization expected; expect list
of returned results
Expectation of customized results (data
access) aligned with user profiles (role,
group, projects, etc.)
Analysis Generic Domain-specific
11. Big data requirements for enterprise search
11
Unifying diverse sets of data1 Allows users to ask questions that haven’t been
asked before
Automatic and real-time3
Content is automatically indexed and available
for search, enabling users to find data almost
as quickly as it’s being captured
Identifying what’s relevant2
Increase productivity by streamlining search
users can focus on transforming and extracting
the right data for analysis
BenefitsRequirements
Action-oriented / insight driven4 Maximize return on human capital
12. Tackling big data requirements for enterprise search
12
Unifying diverse sets of data1
Automatic and real-time3
Identifying what’s relevant2
HowRequirements
Action-oriented / insight driven4
– Create single view of enterprise content by
connecting to different sources and
repositories
– Data streamlining
– Automatic query guidance
– Intelligent summarization
– Intelligent highlighting
– Personalization
– Classification and clustering
– Handled via indexing protocol – not directly
visible to end users
– Concept navigation / visualizations
– Eduction
– Sentiment
– Classification and clustering
– Machine Learning
13. Personalizing data
Implicit and explicit
profiling
Relationship discovery /
community and
expertise networks
Intent-based ranking
13
Customer C is linked
to Customer E via
Customer D
Customer H is the
most influential in
Customer B’s network
Customer A is in
Customer B’s network
Customers F and G
purchased the same
model last year
14. Classification and Clustering
14
Product performance issues
Side letters
Off balance
sheet transactions
Managed classification:
Create categories using
business rules or training
Automatic classification
and clustering:
Automatically determine
categories based on patterns
and relationships in
information
15. Eduction and Sentiment
I stayed at the resort last
week, and though the
mattresses were very nice,
the service was awful.
15
Names
Places
IP addresses
Companies
Events
Relationships
Medicines
Airports
Cars
Social Security numbers
Phone numbers
Credit cards
Dates
Holidays
Job titles
Currencies
Eduction: Apply structure to unstructured data by
automatically identifying and extracting terms in
documents that lend themselves to key fields
Sentiment: Decomposition and classification
within a sentence to pull out the sentiment
surrounding specific topics
16. Intelligent search with Machine Learning
16
Document interpretation /
topic and concept identification
Sentiment analysis
Query analysis / clustering
Personalization of content /
recommendations
Categorization / classification
of data
Entity identification
Ranking results
Auto-complete / directed
navigation
18. What else are users asking for?
Improved treatment of poor quality data
More interactive search / digital assistants
Streamlined / better defined workflows
Better visualization / user experienceExtract
Analyze
Connect Index
Search
Predict
20. Stanford Children’s Health
Research for healthcare provider ranking study
Challenge
– Quality and clinical effectiveness research on ~115K patients, ~390K
encounters, ~3M documents
– Diverse data types (structured and unstructured) across data silos
involved
– Time constraints vs extensive search scope
Result
– Cross patient search for cohort identification
– Intuitive UI for simple query construction
– Easy clinical note review with highlights, navigation and related
concepts
– Portable queries and results
– Fast indexing
20
21. Leading Chinese telecom
Communications service provider industry
Challenge
– Allow users to access information on thousands of public services
directly from their mobile phones – success of this platform depends
on the users’ ability to quickly find information
Result
– Over 740 million subscribers can search through more than 8,000
applications for public service information, including public
transportation schedules, public health records, traffic offenses, and
more
– Users receive more accurate search results than ever before
– Customers get the most relevant and useful information regardless of
the terms they use in the search
21
22. Leading financial software, data and media company
Subscribers require up-to-the-second information on market conditions and trends
Challenge
– Deliver search performance at the scale required by the size of its data
repository, 200 million messages, 15-20 million chats daily
– Provide robust, cost-efficient solution with scalability for large and
growing volume of data, supported by small IT headcount
Result
– Detects trends in real-time messaging and chats for subscribers
– Accommodates 10+ billion of document entries without compromising
performance today
– Ensures scalability delivers ROI in the future
22
23. Leading American multinational telecom
Paying careful attention to every aspect of customer-facing processes and applications
Challenge
– Provide support desk staff with fast access to precise information
required to address customer’s problem
– Improve knowledge management system search capabilities
Result
– Reduced time-to-resolution with fast queries that ensure support
experts can resolve customer issues quickly
– Relevant results as query functionality makes sure that results deliver
information most likely to resolve customer issues
23
24. NASCAR
Fan and Media Engagement Center
Challenge
– Economic conditions
– Rapidly changing media landscape (social media growth)
– Rev pressures from sponsors
– Industry leadership expectation
Result
– Live monitoring and analysis of broadcast, news and social media
– Sponsors’ brand and fan sentiment analyses
– Analytics to support race team sponsorship renewals
– Crisis management
– Build fan base with active engagement
24