Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Semantics And Search Jon Atle Gulla
1. From Google Search to
Semantic Exploration
Jon Atle Gulla
Professor
Norwegian University of Science and Technology
jag@idi.ntnu.no
Semantic Days 2007
Jon Atle Gulla
2. Agenda
Traditional search applications
Adding shallow linguistics to traditional search
The concept of semantic search
Ontologies in search applications
Ontologies for semantic annotation & exploration
Ontology-driven query interpretation
quot;Hakia thinks that indexing has plateaued and that semantic technologies will take over
quot;Hakia thinks that indexing has plateaued and that semantic technologies will take over
for the next generation of searchquot;.
for the next generation of searchquot;.
MacManus, R. “Hakia Takes On Google With Semantic Technologies”.
MacManus, R. “Hakia Takes On Google With Semantic Technologies”.
http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php
http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php
Semantic Days 2007
Jon Atle Gulla
3. The Language Problem in Search
People use the language differently
Authors
?
cument
do query?
t of the e
n
e conte ent answers th
th
What is his docum
t
k n o w if
How to
Information
user
Semantic Days 2007
Jon Atle Gulla
4. The Google Search Experience
Query
Similarity
Similarity
Page rank Index
Page rank Index
Linguistics
Linguistics
Results
Semantic Days 2007
Jon Atle Gulla
5. Traditional Search Principles
Bag-of-words principle You see:
Machine understands document as a
set of word frequencies A drilling rig or oil rig is a structure housi
equipment used to drill for and extract oil or
Word matching principle natural gas from underground reservoirs.
Drilling rigs can also be used to drill for
Syntactic search:
water or for exploration purposes.
Relevant documents are documents
that contain exactly those words that
appear in the query
Machine sees:
Morpho-syntactic search:
drill(4) purpos(1)
Relevant documents are documents equip(1) reservoir(1)
that contain inflectional variants of explor(1) rig(3)
extract(1) structur(1)
exactly those words that appear in
hous(1) underground(1)
the query natur(1) water(1)
oil(2)
One shot principle
Query and result set ignored when
new query is posted
Semantic Days 2007
Jon Atle Gulla
6. Traditional Search Principles
Bag-of-words principle User need:
Machine understands document Christmas tree
as a set of word frequencies
Word matching principle
Syntactic search:
Relevant documents are
Index
documents that contain exactly
those words that appear in the A Christmas tree is one of the most popular traditions
query A Christmas tree is one of the most popular traditions
associated with the celebration
associated with the celebration
of Christmas.
Morpho-syntactic search: of Christmas.
It is normally an evergreen coniferous tree that
Relevant documents are It is normally an evergreen coniferous tree that
is brought into a home or used in the open, and is
is brought into a home or used in the open, and is
documents that contain decorated with Christmas lights and colourful
decorated with Christmas lights and colourful
ornaments during the days around Christmas.
inflectional variants of exactly ornaments during the days around Christmas.
those words that appear in the A Christmas tree is a set of valves, pipes, and fittings
A Christmas tree is a set of valves, pipes, and fittings
used to control the flow of oil and gas as it leaves a well
query used to control the flow of oil and gas as it leaves a well
and enters a pipeline.
and enters a pipeline.
One shot principle
Query and result set ignored
when new query is posted Relevance given by document similarity
Semantic Days 2007
Jon Atle Gulla
7. Traditional Search Principles
Bag-of-words principle Search query
Implementation:
Machine understands document Christmas trees
as a set of word frequencies
Word matchingDocument relevant to query if cosine similarity
principle
above a certain threshold:
Syntactic search: Result set
Relevant documents are n
∑
documents that contain exactly (q *d )
those words that appear in the1 i i q d
=( ) •( )
sim(q, d) = i=
query
n n
q d
∑ ∑
Morpho-syntactic search: 2 2
qi * di
Relevant documents are
i =1 i =1
documents that contain
inflectional variants of exactly
those words thatvector representation of document
d: appear in the
query q: vector representation of vector
One shot principle
Query and result set ignored
when new query is posted
Semantic Days 2007
Jon Atle Gulla
8. Adding Shallow Linguistics to Search
Clustering or log analysis for grouping
search results for ‘oil’
Text categorizsation
Entity search
Teaser generation
Spell checking
Collocations
Semantic Days 2007
Jon Atle Gulla
9. But
“A drilling rig or oil rig is a structure housing equipment used to drill for
and extract oil or natural gas from underground reservoirs. Drilling rigs
can also be used to drill for water or for exploration purposes.” (Ref:
Wikipedia)
Semantic Search Principle:
Semantic Search Principle:
Text is still just a set of strings
rig
subclassOf
sameAs
drill(4) oil rig purpos(1)
drilling rig
equip(1) partOf reservoir(1)
usedFor
drill
explor(1) rig(3)
extract(1)gas structur(1)
water natural
oil
hous(1) underground(1)
natur(1) water(1)
Use ontologies oil(2)
Use ontologiesto represent domain vocabulary,
to represent domain vocabulary,
documents’ content and/or user’s information needs
documents’ content and/or user’s information needs
Semantic Days 2007
Jon Atle Gulla
10. Semantic Approaches to Search
Search principles
Syntactic search Semantic search
Document view Bag-of-words Terms and concepts
Search approach Word matching Concept matching
Search process One shot Exploratory session
Applications of ontologies in semantic search:
Help user formulate semantic queries Scientific reports
Reformulate/reinterpret queries IIP project
Browse domain
Formulate related queries
Interoperability between search applications
Semantic indexing of documents
Semantic Days 2007
Jon Atle Gulla
11. 1. Ontologies in Semantic Exploration
Use graphical ontologies for query formulation
Semantic annotations of documents
Construct queries graphically
Use ontological structures to expand query
Use ontology to visualize search results
Semantic Days 2007
Jon Atle Gulla
12. Query Formulation
Queries expanded from ontological structures
Semantic Days 2007
Jon Atle Gulla
13. Query Refinement
Use ontological structures to explore the domain
Semantic Days 2007
Jon Atle Gulla
14. 2. Ontology-Driven Query Interpretation
User terminology
User query
User query
Semantic layer
--- -----
--- -----
Semantic Query interpretation
--- -----
--- -----
--- -----
Ontology trained
--- -----
--- -----
Semantic Query interpretation
--- -----
--- -----
--- -----
--- -----
--- -----
on person and
--- -----
--- -----
User Query --- -----
User Query domain collection
interpretation mapping
--- -----
--- -----
--- -----
interpretation mapping
--- -----
--- ----- --- -----
--- -----
--- ----- --- -----
--- ----- --- -----
--- ----- --- -----
--- ----- --- -----
Domain collection
Domain document
Standard
Standard
collection
search engine
search engine
Semantic Days 2007
Jon Atle Gulla
15. Training Ontology for Search
Characteristic terms in
these documents express
user’s interpretation of
christmas tree
CHRISTMAS TREE for this
document collection
Concept Prominent document terms
CHRISTMAS TREE
0.95 christmas tree
0.80 christmas trees
0.35 x-tree
Documents 0.05 valves
viewed by
0.02 wellhead
user (and
considered
relevant)
Semantic Days 2007
Jon Atle Gulla
16. The Personalized Ontology
Each concept described Ontology of weightedIndex terms
in terms words
Words correspond to user’s assessment of which information is relevant to
a concept for this document base
Concept – term associations created automatically based on user’s
behavior
Football ontology
Concept
Index terms
CHRISTMAS TREE
0.95 christmas tree
WELL
Concept-term matrix a dynamic structure 0.80 christmas trees
that reflects user’s preferences and 0.35 x-tree
0.35
PIPE
behavior 0.05
valves
0.50 0.02
wellhead
0.95
well
0.98
wells
...
0.95 pipe
0.10 pipes
Semantic Days 2007
Jon Atle Gulla
17. Semantic Search Query
Retrieved from ontology
An artefact that is an assembly of pipes and piping parts, with
User query valves and associated control equipment that is connected
to the top of a wellhead and is intended for control of fluid from
CHRISTMAS TREE
a well
Matches in document base
Query mapping Christmas trees are used on both subsea and surface wellheads
and both are available in a wide range of sizes and configurations, ...
christmas tree:0.95, christmas trees:0.8,
A Christmas tree is one of the most popular traditions associated
x-tree:0.35, valves:0.05, wellhead:0.04 with the celebration of Christmas. ...
The function of a christmas tree is to both prevent the release of oil or
gas from an oil well into the environment and also to direct and control
the flow of formation fluids from the well. ...
Private Christmas trees are not usually put up until at least the middle
of December and are usually taken down by the 6th of January , ...
It is normally an evergreen tree that is brought into a home or used in
the open, and is decorated with Christmas lights and colourful
Concept Prominent document terms ornaments during the days around Christmas.
Good understanding of topside equipment used, including x-trees
CHRISTMAS TREE and wellhead systems
0.95 christmas tree
Wellhead valves are used to isolate the flow of oil or gas at the
0.80 christmas trees takeoff from an oil or gas well. .
0.35 x-tree VENTILTRE er en ventilenhet montert på toppen av stigerør eller
0.05 brønnhode, ofte kalt juletre
valves
0.04 A wellhead consists of the spools, valves, and other components
wellhead
which contain the pressure within the well.
Semantic Days 2007
Jon Atle Gulla
18. Semantic Search Results
Retrieved from ontology
An artefact that is an assembly of pipes and piping parts, with
valves and associated control equipment that is connected
to the top of a wellhead and is intended for control of fluid from
Query/document a well
similarity Matches in document base
Christmas trees are used on both subsea and surface wellheads
plural form
strong and both are available in a wide range of sizes and configurations, ...
A Christmas tree is one of the most popular traditions associated
singular form, but other words different
weak with the celebration of Christmas. ...
The function of a christmas tree is to both prevent the release of oil or
singular form gas from an oil well into the environment and also to direct and control
strong
the flow of formation fluids from the well. ...
plural form, but other words different 4/4
Precision: 4/4 Private Christmas trees are not usually put up until at least the middle
weak Precision: of December and are usually taken down by the 6th of January , ...
Recall: 5/6
Recall: 5/6 It is normally an evergreen tree that is brought into a home or used in
different words, christmas related
no the open, and is decorated with Christmas lights and colourful
ornaments during the days around Christmas.
Good understanding of topside equipment used, including x-trees
synonyms and wellhead systems
strong
Wellhead valves are used to isolate the flow of oil or gas at the
related words
acceptable takeoff from an oil or gas well. .
related words, ontology not trained in VENTILTRE er en ventilenhet montert på toppen av stigerør eller
no brønnhode, ofte kalt juletre
this language
A wellhead consists of the spools, valves, and other components
acceptable related words which contain the pressure within the well.
Semantic Days 2007
Jon Atle Gulla
19. Keyword Search Query
User query Retrieved from ontology
An artefact that is an assembly of pipes and piping parts, with
x-tree
valves and associated control equipment that is connected
to the top of a wellhead and is intended for control of fluid from
a well
User interpretation
Matches in document base
Christmas trees are used on both subsea and surface wellheads
CHRISTMAS TREE:0.35
and both are available in a wide range of sizes and configurations, ...
Query mapping
The function of a christmas tree is to both prevent the release of oil or
christmas tree:0.95, christmas trees:0.8, gas from an oil well into the environment and also to direct and control
the flow of formation fluids from the well. ...
x-tree:0.35, valves:0.05, wellhead:0.04
Concept Prominent document terms
Good understanding of topside equipment used, including x-trees
CHRISTMAS TREE and wellhead systems
0.95 christmas tree
Wellhead valves are used to isolate the flow of oil or gas at the
0.80 christmas trees takeoff from an oil or gas well. .
0.35 x-tree
0.05 valves
0.04 A wellhead consists of the spools, valves, and other components
wellhead
which contain the pressure within the well.
Semantic Days 2007
Jon Atle Gulla
20. Semantic Search - Learning
No fixed set of relevant documents – depends on user preferences
User query
User query CHRISTMAS TREE
Personalized
concept-
term
matrix
Result page
Documents viewed
by user (and
considered relevant)
Semantic Days 2007
Jon Atle Gulla
21. 2. IIP Ontology on Web Documents
User terminology Experiment with real document collection
Horizontal
Horizontal
tree
tree
Mapping to query based on document
Interpretation of ‘horizontal tree’ tree content
horizontal
HORIZONTAL VESSEL 0.162tree 1.0
horizontal christmas
HORIZONTAL CHRISTMAS TREE Score: WELLHEAD HOUSING 0.109
0.01488
HORIZONTAL BOREHOLE 0.138 1.0
horizontal christmas trees CONDUCTOR HOUSING 0.109
horizontal x-tree 1.0
CONDUCTOR HOUSING, HORIZONTAL VESSEL Score: 0.00586
HORIZONTAL CHRISTMAS TREE 0.088 WEAR BUSHING 0.101
Semantic layer
horixontal x-trees 1.0 HORIZONTAL VESSEL JOINT GASKET 0.096
WELLHEAD HOUSING, RING Score: 0.00586
HORIZONTAL TUBING HANGER 0.072
PLANEWEAR BUSHING, HORIZONTAL VESSELSUBSEA PRODUCTION MANIFOLD 0.096
Score: 0.00411
0.057
CHRISTMAS
sentre 0.465TREE, HORIZONTAL CHRISTMAS TREE Score: 0.00369
INTERSECTION 0.055 TESTING TOOL 0.088
--- -----
Ontology adapted
--- -----
Semantic Query interpretation
--- -----
--- -----
--- -----
--- -----
--- -----
Semantic Query interpretation HORIZONTAL CHRISTMAS TREE, HORIZONTAL VESSEL Score: 0.00344
PIPING END 0.051 0.216 BORE PROTECTOR 0.088
--- -----
deepwater
--- -----
--- -----
BENDING STRESS 0.043 web documents HORIZONTAL CHRISTMAS TREE 0.085
using
--- -----
TUBING0.092 HORIZONTAL VESSEL Score: 0.00323
SPOOL,
--- -----
atlantic
--- -----
--- -----
User Query --- -----
SHIFTING TOOL 0.040 the oil business
horizontal from
CONDUCTOR HOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00317
RUNNING TOOL 0.076
User Query 0.088
IIP ontology trained
interpretation mapping
--- -----
WELLHEAD HOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00317
AXIS 0.037 TREE 0.068
--- -----
--- -----
interpretation mapping
--- -----
develop HANGER, HORIZONTAL TUBING HANGER Score: 0.00295
0.085
--- ----- --- -----
--- -----
FIXED TUBINGon web oil
--- ----- --- -----
--- ----- --- -----
STRUCTURE 0.037 TUBING SPOOL 0.060
--- ----- --- -----
--- ----- --- -----
FLUID investordocuments
0.085
BORE PROTECTOR, HORIZONTAL VESSEL Score: CHRISTMAS TREE 0.048
SURFACE 0.00284
SEPARATOR 0.036
ELECTRICAL0.085
water PENETRATOR 0.034
TESTING TOOL, HORIZONTAL BOREHOLE Score: 0.00243 0.046
CONTROL MODULE
gulf 0.083 0.034
TEST SEPARATORHOUSING, HORIZONTAL CHRISTMAS TREE Score: 0.00238
WELLHEAD DELIVERY PRICE 0.045
Domain collection transocean 0.078
VOLUME FLOW RATE 0.033BOREHOLE Score: VALVE NORMALLY OPEN 0.045
TREE, HORIZONTAL 0.00235
CHRISTMAS
field 0.072 TREE, HORIZONTAL VESSEL Score: CHRISTMAS TREE 0.043
SUBSEA 0.00227
HYDROGEN FLUORIDE 0.029
WEAR BUSHING, HORIZONTAL CHRISTMAS TREE Score: 0.00222
BASE STEEL 0.028 CHRISTMAS TREE 0.042
bluewater Web collection
0.070
TREE,0.066
HORIZONTAL VESSEL Score: 0.00220
... ...
deep
Reformulated from different
Reformulated
query domains
query
Semantic Days 2007
Jon Atle Gulla
22. Conclusions
Traditional search based on keyword matching and shallow
linguistics
Ontologies provide vocabulary for semantic search
Graphical ontology for query formulation
Semantic exploration of domain
Visual queries
Trained ontology for query interpretation
Ontology maps between concepts and domain terms
Semantic interpretation hidden to users
Challenges
Linking concepts to terms
Scalability
Semantic Days 2007
Jon Atle Gulla