Learn more about Entity Extraction May 2014

Scenarios | Benefits of using entity extraction
Explore your content
Explore the enterprise graph
Discover insights about
your products
Monitor trends
Discover new expertise
inside your organization
Find the people with the right
competences
Enhance search
navigation
Filter unstructured data

Scenarios | Benefits of using entity extraction
Prevent duplicate work
Find similar content
Help your users find their
dream home
Extract potential decision criteria
from natural language
Visualize your content in
a new way
Enrich documents with metadata

Discover new expertise
inside your organization
Find the people with the right competences

Motivation
• Search for “usability”
• Only people that have tagged
themselves with “usability” will be
returned
• If we rely only on standard
category types, database
information, we get only what is in
that person database
• But what if you could find also
those that write, blog, or tweet
about “usability”, without them
being explicitly tagged with this
category?

Enhanced search index
• The search index is enhanced with information about what topics,
keywords, people, places, etc. authors write about

• Search for “usability”
• Get improved search results
 Discover competences people
have
 Discover interests people have
and share
 Gather all people writing about the
same topic
Enhanced expertise search

Enhance search navigation
Filter unstructured data

Motivation
• Search for “yoga”
• Lots of semi-structured
documents (HTML, Word,
PDF, etc)
• Some are missing
administrative metadata such
as author, date last saved
• Some are missing
descriptive metadata such as
title, topic, tags, category
No proper title
Will you go through
all results to find
the relevant ones?

Extract named entities and metadata
• Identity and add to document information such as title, keywords,
author, summary, subsection titles

New filters and improved metadata
• Search for “yoga”
• The newly created data is
used to filter documents and
improve relevance
 Improved visual results
(documents have titles)
 Improved relevance (titles
and subsection titles are
ranked higher than body text)
 Possibility to filter on authors,
topics, places, etc (use the
filter rather than pagination)

Explore your content
Explore the enterprise graph

Motivation
• Search for ‘Copenhagen’ on
your intranet
• Ambiguous query
• Lots of results
• Missing context
• What is the user intent with
this query?

Relationship Extraction for Entities
• Extract relations from unstructured data
• Built upon named entity recognition
• Relationship extraction enables us to do build a graph search
solution with unstructured data
Lorem ipsum dolor sit amet Sarah Jensen, consectetur adipiscing elit Philadelphia et
Copenhagen. Fusce nec placerat libero. Suspendisse nibh quam, sodales in posuere
ac, porttitor non erat. Sed semper sodales varius. Fusce elementum Findwise, enim
sed semper ultrices Carl Sorensen, nisl ligula consectetur sapien, non feugiat sapien
enim id quam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per
inceptos himenaeos. Nullam egestas non velit nec accumsan. Google at orci augue.
Proin tempus tristique arcu, a lobortis diam tempus ut. Nam arcu risus, tempor nec elit
eu, Anders Anderson posuere viverra mauris. Donec tempor in magna in mollis.
Suspendisse in elementum magna. Findwise in faucibus sapien, et Microsoft. Fusce
ullamcorper malesuada sapien, sit amet viverra odio bibendum sed. Fusce molestie
vel tortor nec eleifend. Nullam et leo ac felis iaculis convallis.
Sarah Jensen
Philadelphia
Copenhagen
Google
Anders Anderson
Findwise
Microsoft
Carl Sorensen
Sarah Jensen
Philadelphia
Copenhagen
Google
Anders Anderson
Findwise
Microsoft
Carl Sorensen

Suggestions as you type, using the
graph
• Search for ‘Copenhagen’ on
your intranet
 Narrow down search results
directly from the search box
 Disambiguate the query by
selecting one of the different
type of suggestions
(consultants, projects,
partners)
 Navigate directly to 2nd or
higher level connections on
the graph

Business Intelligence, using the graph
• Search for: ’Customers where we have done Projects based on
Google technology with at least 1000 hour consulting time and a revenue
of more than 1 MDKK and the word ”e-commerce” is mentioned many
times in the Project Documentation’
Business Intelligence
Project numbers
(worked hours)
Financial
numbers
(revenue,
profits)
Project
Documen
tation
How would this
query look like
in SQL?

Discover insights
about your products
Monitor trends

Motivation
• Search for the product name ‘Tusin’
• Product is mentioned in different
sources, under different contexts (user
feedback, marketing material, internal
specifications), and using different
terminologies (on social media
compared to website)
• How to keep track of all information?
• How easy is it to identify trends?

Identify the same product in different contexts
• Identify the entity denoting the same product from different
sources
Tusin
Azure
Internal name
for the same
product
Word
Doc
Internal
Production
Specification
PDF
Doc
Product
Marketing
Material from
Website
User
Comment
Feedback
about the
marketing
material / the
experience of
the user
User
Tweet
Mentions the
product
Product
Video
Video
View
Video
comment
User
feedback
Metric
Task
item
Internal
Issues
Management
System
Internal
News

Monitor trends on your products
• Search for ‘Tusin’
or
• Remember it as a search
term and create a
dashboard with content
driven by search
 Monitor trends
 Reduce time for replying
customers or users
 Stay competitive

Find similar content

Motivation
• Just started working on a new
material in a construction
company
• What is the cost of
duplicating the work?
• Will you perform a search on
previous work?
• What if another team has a
similar initiative?

Enhanced Search Index
• Automatically extract entities and representative keywords from
content
Documents
Announcements
Public EmailsNewsfeed
Steel Structures
Glass Type 1.A
Project ANSATorso Tower
Polyethylene Terephthalate

• Get suggestions of
similar work based
on extracted entities
 Identify similar work
early in the project
 Identify potential
collaborations
 Prevent duplicate
work

Visualize your content
in a new way
Enrich documents with metadata

Motivation
• Search for “financial results
Copenhagen”
• Search results: documents
• Clicking on a result opens
the document
• Does this search answer
the user question?

Identify entities in documents
• Identify locations, revenues, departments, etc from semi-
unstructured data
• Combine with data in spreadsheets or databases
Documents
Database
Spreadsheets
Answer

Visualize your content in a new way
• Search for “financial results
Copenhagen”
• Additional information shown
• Can show computed results
 Enrich documents with
metadata
 Visualise the content
 Compute answers
 Make comparisons
 Create dashboards based on
searches

Help your users find
their dream home
Extract potential decision criteria from
natural language

Motivation
• Searching for an ‘apartment with a
good view, located in central Copenhagen,
well sized bathroom, close to shopping
outlets, preferably with 3 rooms’
• The apartment information
consists of mostly structured data
(m2, number of rooms, post
number, floor)
• Can we improve the search
experience?
Long list of static
filters
Search query consists
of an area (post code,
street etc.)

Understanding what the users want
• Here’s how Facebook helps users define their queries:
• Can we interpret the query ‘apartment with a good view, located in
central Copenhagen, well sized bathroom, close to shopping outlets,
preferably with 3 rooms’ ?

Understanding what the users want
• Searching for ‘apartment with a good
view, located in central Copenhagen, well
sized bathroom, close to shopping
outlets, preferably with 3 rooms’
• Apartments with 3 rooms are shown
in search results but those with less
are not excluded
• Those that mention shopping outlets
(such as Netto or Fakta) are boosted
 Interpret natural language
 Boost results based on ‘preferences’
 Better search experience
 Increase user satisfaction
Boost those with 3 rooms
(boost on map can be
represented by a bigger
pointer)
Free text search

Entity Extraction
Entity extraction is the process of identifying named entities (such as locations,
people, companies) in a block of text
Add structure to
unstructured data
New possibilities of
interpreting the data
Improve data quality and
findability of documents
Reduce time spent by
users manually
structuring content

Entity Extraction Framework
Combines dictionaries
with trained model and
regular expressions
based on needs
Scalable, adaptable and
extendable framework
Automatically enrich
documents with named
entities
Iterative approach to
continuously improve
accuracy
Built by Findwise as a reply to our customer requirements and vision

Entity Extraction Framework
Autotag
Edit
Evaluate Incremental
train
90% accuracy
The Danish and Swedish
entity extractors can
reach 90% accuracy

Graphical Annotation Tool
Visual representation of
annotated documents
Annotate more
documents to improve
precision
Easy-to-use, point and
click interaction
Built by Findwise as a reply to our customer requirements and visions

Anders Häggdahl
anders.haggdahl@findwise.com

Learn more about Entity Extraction May 2014

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Learn more about Entity Extraction May 2014

Ähnlich wie Learn more about Entity Extraction May 2014 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Learn more about Entity Extraction May 2014