Read this and get inspired how to use entity extraction to better consume unstructured information assets in your organization. This presentation is made by my colleague Paula Petcu
3. Scenarios | Benefits of using entity extraction
Explore your content
Explore the enterprise graph
Discover insights about
your products
Monitor trends
Discover new expertise
inside your organization
Find the people with the right
competences
Enhance search
navigation
Filter unstructured data
4. Scenarios | Benefits of using entity extraction
Prevent duplicate work
Find similar content
Help your users find their
dream home
Extract potential decision criteria
from natural language
Visualize your content in
a new way
Enrich documents with metadata
6. Motivation
• Search for “usability”
• Only people that have tagged
themselves with “usability” will be
returned
• If we rely only on standard
category types, database
information, we get only what is in
that person database
• But what if you could find also
those that write, blog, or tweet
about “usability”, without them
being explicitly tagged with this
category?
7. Enhanced search index
• The search index is enhanced with information about what topics,
keywords, people, places, etc. authors write about
8. • Search for “usability”
• Get improved search results
Discover competences people
have
Discover interests people have
and share
Gather all people writing about the
same topic
Enhanced expertise search
10. Motivation
• Search for “yoga”
• Lots of semi-structured
documents (HTML, Word,
PDF, etc)
• Some are missing
administrative metadata such
as author, date last saved
• Some are missing
descriptive metadata such as
title, topic, tags, category
No proper title
Will you go through
all results to find
the relevant ones?
11. Extract named entities and metadata
• Identity and add to document information such as title, keywords,
author, summary, subsection titles
12. New filters and improved metadata
• Search for “yoga”
• The newly created data is
used to filter documents and
improve relevance
Improved visual results
(documents have titles)
Improved relevance (titles
and subsection titles are
ranked higher than body text)
Possibility to filter on authors,
topics, places, etc (use the
filter rather than pagination)
14. Motivation
• Search for ‘Copenhagen’ on
your intranet
• Ambiguous query
• Lots of results
• Missing context
• What is the user intent with
this query?
15. Relationship Extraction for Entities
• Extract relations from unstructured data
• Built upon named entity recognition
• Relationship extraction enables us to do build a graph search
solution with unstructured data
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et
Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere
ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim
sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien
enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per
inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit
eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis.
Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce
ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie
vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et
Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere
ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim
sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien
enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per
inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit
eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis.
Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce
ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie
vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et
Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere
ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim
sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien
enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per
inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit
eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis.
Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce
ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie
vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Sarah Jensen
Philadelphia
Copenhagen
Google
Anders Anderson
Findwise
Microsoft
Carl Sorensen
Sarah Jensen
Philadelphia
Copenhagen
Google
Anders Anderson
Findwise
Microsoft
Carl Sorensen
16. Suggestions as you type, using the
graph
• Search for ‘Copenhagen’ on
your intranet
Narrow down search results
directly from the search box
Disambiguate the query by
selecting one of the different
type of suggestions
(consultants, projects,
partners)
Navigate directly to 2nd or
higher level connections on
the graph
17. Business Intelligence, using the graph
• Search for: ’Customers where we have done Projects based on
Google technology with at least 1000 hour consulting time and a revenue
of more than 1 MDKK and the word ”e-commerce” is mentioned many
times in the Project Documentation’
Business Intelligence
Project numbers
(worked hours)
Financial
numbers
(revenue,
profits)
Project
Documen
tation
How would this
query look like
in SQL?
19. Motivation
• Search for the product name ‘Tusin’
• Product is mentioned in different
sources, under different contexts (user
feedback, marketing material, internal
specifications), and using different
terminologies (on social media
compared to website)
• How to keep track of all information?
• How easy is it to identify trends?
20. Identify the same product in different contexts
• Identify the entity denoting the same product from different
sources
Tusin
Azure
Internal name
for the same
product
Word
Doc
Internal
Production
Specification
PDF
Doc
Product
Marketing
Material from
Website
User
Comment
Feedback
about the
marketing
material / the
experience of
the user
User
Tweet
Mentions the
product
Product
Video
Video
View
Video
comment
User
feedback
Metric
Task
item
Internal
Issues
Management
System
Internal
News
21. Monitor trends on your products
• Search for ‘Tusin’
or
• Remember it as a search
term and create a
dashboard with content
driven by search
Monitor trends
Reduce time for replying
customers or users
Stay competitive
23. Motivation
• Just started working on a new
material in a construction
company
• What is the cost of
duplicating the work?
• Will you perform a search on
previous work?
• What if another team has a
similar initiative?
24. Enhanced Search Index
• Automatically extract entities and representative keywords from
content
Documents
Announcements
Public EmailsNewsfeed
Steel Structures
Glass Type 1.A
Project ANSATorso Tower
Polyethylene Terephthalate
25. Prevent duplicate work
• Get suggestions of
similar work based
on extracted entities
Identify similar work
early in the project
Identify potential
collaborations
Prevent duplicate
work
27. Motivation
• Search for “financial results
Copenhagen”
• Search results: documents
• Clicking on a result opens
the document
• Does this search answer
the user question?
28. Identify entities in documents
• Identify locations, revenues, departments, etc from semi-
unstructured data
• Combine with data in spreadsheets or databases
Documents
Database
Spreadsheets
Answer
29. Visualize your content in a new way
• Search for “financial results
Copenhagen”
• Additional information shown
• Can show computed results
Enrich documents with
metadata
Visualise the content
Compute answers
Make comparisons
Create dashboards based on
searches
30. Help your users find
their dream home
Extract potential decision criteria from
natural language
31. Motivation
• Searching for an ‘apartment with a
good view, located in central Copenhagen,
well sized bathroom, close to shopping
outlets, preferably with 3 rooms’
• The apartment information
consists of mostly structured data
(m2, number of rooms, post
number, floor)
• Can we improve the search
experience?
Long list of static
filters
Search query consists
of an area (post code,
street etc.)
32. Understanding what the users want
• Here’s how Facebook helps users define their queries:
• Can we interpret the query ‘apartment with a good view, located in
central Copenhagen, well sized bathroom, close to shopping outlets,
preferably with 3 rooms’ ?
33. Understanding what the users want
• Searching for ‘apartment with a good
view, located in central Copenhagen, well
sized bathroom, close to shopping
outlets, preferably with 3 rooms’
• Apartments with 3 rooms are shown
in search results but those with less
are not excluded
• Those that mention shopping outlets
(such as Netto or Fakta) are boosted
Interpret natural language
Boost results based on ‘preferences’
Better search experience
Increase user satisfaction
Boost those with 3 rooms
(boost on map can be
represented by a bigger
pointer)
Free text search
35. Entity Extraction
Entity extraction is the process of identifying named entities (such as locations,
people, companies) in a block of text
Add structure to
unstructured data
New possibilities of
interpreting the data
Improve data quality and
findability of documents
Reduce time spent by
users manually
structuring content
36. Entity Extraction Framework
Combines dictionaries
with trained model and
regular expressions
based on needs
Scalable, adaptable and
extendable framework
Automatically enrich
documents with named
entities
Iterative approach to
continuously improve
accuracy
Built by Findwise as a reply to our customer requirements and vision
38. Graphical Annotation Tool
Visual representation of
annotated documents
Annotate more
documents to improve
precision
Easy-to-use, point and
click interaction
Built by Findwise as a reply to our customer requirements and visions