Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
lexical-semantics-221118101910-ccd46ac3.pdf
1. Lexical & Query
Semantics Differences
for Information
Retrieval
Why PageRank is Sometimes Better
for Semantics
2. Closing the Gap between Search
Query Language and Document
Language
• There are three components of Information
Retrieval Systems.
• Query Understanding
• Document-Query Relevance
Understanding
• Document Clustering and Ranking
• The path from a “search query” to a “search
document” involves query parsing, processing,
augmenting, scoring, ranking and clustering.
• Query Understanding is where the SEO starts.
• Document Creation is where the SEO continues.
• Document Ranking where the SEO repeats itself.
Source: Query Language Determination Using Query Terms and Interface Language
3. What is this Search
Query Language?
• Search Query Language is invented in
Cranfield Experiments in late 1950s.
• Scientists realized that while “querying a
document”, the language gets densified, and
words change their meaning.
• There is a huge vocabulary difference between
“queries” and “documents.
• Because, people do not know what to ask for a
search engine, they only know what
represents the topic.
• The “query language” uses “knowledge
representation” with “dense vectors”.
• Query Term Weight Calculation is born during
these experiments.
Source: Augmenting Queries With Synonyms From Synonyms Map
4. Query Search
Language
• Cranfield Experiments: Cyril W. Cleverdon is one of the first
Information Retrieval experiments.
• It is for testing the efficiency of indexing systems.
• The “Vannevar Bush’s ‘As we may think’” paper is cited during the
research.
• The Cranfield Experiments invented the “Search Language” concept
to admit the fact that words change their meanings inside the
search queries even if they are used same inside the document.
• Information Retrieval has to make a distinction between
“understanding relevance” and “understanding query”.
• To understand the query, search engine can’t use the language
model for understanding the documents.
• Document language and query language are completely different.
• Inside the documents, we see “lexical semantics”.
• Inside the queries, we see “query semantics” with “search
language”.
Source: “As We May Think” – Vannevar Bush; Cranfield Experiments, Cyril W. Cleverdon, 1958.
5. An Algorithm doesn’t have
to be liked by your logic
• An algorithm doesn’t have to make sense.
• An algorithm has to be useful.
• Cranfield Experiments is debated for decades, and still it
is cited by new researches.
• Cranfield Experiments do not explain why their method
is working, it just tells, it works.
• The experiments tell test subjects to take documents
from “aerospace” topic, and write some “keywords”, or
“search queries” for “aerospace” topic.
• Test subjects rank the documents based on their own
query terms and their own judgement.
• Cranfield Experiments has created the concepts of
“search language” and “document language” along with
“natural language query”.
Source: Query Generation Using Structural Similarity Between Documents
6. Lexical Semantics
• Lexicosemantics involves word-sense
disambiguation with word copositionality and
language syntax-semantics interface.
• Lexicosemantics helps Formal Semantics (Natural
Language).
• Formal Semantics studies grammatical meaning of
natural language with theoretical computer
science.
• Lexical Semantics helps for construction of
WordNets, FrameNets, Knowledge Bases and
Index Tiers.
• Lexical Semantics is useful for Search Engines to
process a text item to understand “Semantic
Scope” of sentences with “modality”, “tense”,
“binding”, “aspect”, and pragmatics.
• Lexical semantics involve, hyponymy, hypernymy,
antonomy, homonymy, polysemy, meronymy,
holonym and semantic networks.
Source: Query Generation Using Structural Similarity Between Documents
7. Do You Remember Google Merge?
• What if Google became a
semantic search engine by buying
another one?
• Oingo was the first search engine
focused on meaning-based
relevance and advertisement.
• They became “Applied Semantics”
in 2001.
• Google and Applied Semantics
merged together on 18 April,
2003.
8. Applied Semantics (Oingo): The First
Conceptual Search Engine
• Applied Semantics is created by Eytan Elbaz in 1999.
• Information Extraction and Information Responsiveness work
differently than Information Retrieval.
• Lexical Relations do not have the meaning in query terms, but
Query Semantics have. Thus, to augment and expand a query,
query semantics are used first time.
• It is one of the first designs that mention “semantic distance”,
and “relationship strength” to create a semantic network of
concepts.
• It created the way to “Index Tiering”.
Typically, search engines match the search terms to the documents as a whole. If the user is interested in
specific information, for example, “sharks”, but a particular document about “beaches around the
world”, for example, only has one sentence about sharks, it is unlikely that the search engine would return the
document. Documents like the one described are likely to score very low under the query for “sharks”, if at all,
because the document as a whole is not “about” sharks.
Source: Methods and systems for detecting and extracting information
9. Do You Remember Google Merge?
• Similarity (“gluttonous” is similar to “greedy”) – Near Synonyms
• Membership (“commissioner” is a member of “commission”)
• Metonymy (whole/part relations) (“motor vehicle” has part
“clutch pedal”)
• Substance (e.g. “lumber” has substance “wood”)
• Product (e.g. “Microsoft Corporation” produces “Microsoft
Access”)
• Attribute (“past”, “preceding” are attributes of “timing”)
• Causation (e.g. travel causes displacement/motion)
• Entailment (e.g. buying entails paying)
• Lateral bonds (concepts closely related to one another, but not
in one of the other relationships, e.g. “dog” and “dog collar”)
• Capitonyms (Polish (Nation), polish (shining).
• Troponym (Walking -> Hustle, Trot, Crawl)
• Eponym (Tommy John Surgery, Biswanath Panda -> Panda Update)
• Demonym (New Yorkers -> Population of New York City, not
State)
• Acronyms (NASA, North American Saxophone Alliance,
National Auto Sport Association, National Association of
Students of Architecture)
•Source: Bill Slawski
10. Formal Semantics
• Formal Semantics involves philosophy of language and
linguistics together.
• Denotations of natural language expressions are used to
understand the compositionality of words, and their
references.
• Nature of meaning is the philosophical part of formal
semantics.
• Nature of meaning involves the meanings that come from
our nature (Constructivist, Coherence, Correspondence,
Consensus, Pragmatic Theories).
• Formal Semantics have two approaches.
• Truth Conditions
• Compositionality
• Formal Semantics is related to Lexical Semantics, because
based on lexical relations, the compositionality, and truth
conditions change.
11. Formal Semantics and Inquisitive Semantics
• Inquisitive Semantics involve raising new but related
issues to a truth value.
• For example: “Aspirin is used against headache. Does it
work against toothache?”
• The “toothache” and “headache” here have lexical
relations to each other as “meronyms”.
• The Formal Semantics here helps to understand the
truth value of “Aspirin” and its functions.
• The Formal Semantics and Truth Conditions have two
approaches.
• Dynamic Semantics: The raised issues have to change the
context, and the first premise has to be correct.
• Static Semantics: The raised issue doesn’t have to be
relevant, and premise doesn’t have to be true.
• For example: “John gives SEO Suggestions as a Googler.
Does John gives useful SEO Suggestions as a Googler?”
• Technically, John’s occupation is not connected to the
suggestions’ usefulness.
• The Dynamic Semantics change the context of the
previous sentence based on interpreter and receiver.
• Multi-stage or chained reasoning is highly relevant to
the Dynamic Semantics for “context direction”.
Source: Multi-level Recommendation Reasoning over Knowledge Graphs with
Reinforcement Learning
12. Formal Semantics and Compositionality
• Compsoitionality is to understand
lexical relations between the
subjects and objects.
• The easiest way to have a formal
semantics understanding for
compositionality is removing all the
meaningful lexical units from the
sentence.
• For the sentence “Contadu is the
best technology for creating a
semantical understanding to
optimize content”.
• “C is t-t for s-u to o-c”.
• The structure here gives the composition
of words, and how lexical relations are
constructed with constituent rules.
Source: Compositionality by Henk J. Verkuyl, Utrecht University
13. Formal Semantics and Scope
• Scope determines the validity of the specific
declaration’s range.
• Formal semantics helps machines to process
the human language for understanding the
specific scope.
• For example:
• “Every student has a favourite teacher”. -> It is not
clear whether every student has the same teacher
as their favourite or, all of them have different
teachers as their favourite, or some of them have
same teacher, and some of them have different
teachers as their favourite.
• “When three more votes are taken from the court,
the decision will be as we want.” -> The not clear
part here is that, why 3, and which 3. Does the
court have different layers of officials with
different vote values, or especially “X, Y, Z”
officials needed to vote, and which other
decision-givers are against the decision that the
person wants. This is the example of Inquisitive
Semantics, use it for question generation.
• There are other types of scopes, such as “scope
islands”, “exceptional scopes”.
Source: Context-Sensitivity and Individual Differences in the Derivation of Scalar
Implicature
14. Formal Semantics and Scope
• Island Scopes are called Island because
they can’t be taken out of that scope
(island).
• For example: “If every elephant in the
sanctuary gains 5 pounds every next 6
months, I will get a promotion”. The
person doesn’t get another promotion
whenever an elephant gains 5 pounds for
every 6 months. It happens once.
• Exceptional Scope reverses the scope
islands with “a” indefinite.
• For example, “If an elephant gains 5
pounds, I will take a promotion” The
disambiguous, and repetitiveness occur
together.
• Scope is important for Compositionality,
and Compositionality is important for
Lexical Semantics.
Source: Creation of inferred queries for use as query suggestions
15. Formal Semantics and Modality
• Modaliy is part of Formal Semantics
with propositional content, and
philosophical logic. There are
different modalities:
• Permissible: Express the acts that are
allowed.
• Possible: Express the acts that are
possible.
• Quintessential: Express the acts’
features.
• Evidential: Express the facts with
factual source.
• Habitual: Express the habits.
• Iterative: Express the repeated acts.
• Frequentative: Express the permanent
facts.
Source: Semantic frame identification with distributed word representations
16. Formal Semantics and Binding
• Binding is creating a bound between the predicate and the subject. The anaphors are used to express the
connections between bound predicates and subjects.
• Modality express the lexical relations’ features while binding is for lexical relations’ direction.
• The sentence of “Nancy Pelosi must be next presidential candidate for her career”, the “must be” involves
“strong possibility” while “career” is bound to “Nancy Pelosi”.
• The set theory works here to create “People who must be next candidates for presential election” set, and
“being a presidential candidate” as a possible “political career improvement” act, and “presidential
candidate” becomes a topic that involves connections to other types of “candidacies”, while “political career
steps”, and “political discussions” are connected to it.
• The binding and modality works to create an Information Graph, together.
• If the sentence changes as “Nancy Pelosi is the best possible candidate for every democrat in the US.”, the
sentence has a possibility from a different “modality”, and concept of “scope” works here again.
• Declaration tells that “Nancy Pelosi is a candidate” for “every Democrat in the US”. This explains the “scope”
and “compositionality”.
• Compositionality here is “N is a c for e d in the U.S”
• The main issue here is that the scope doesn’t make sense. If a Democrat goes outside of the US, does it mean
that “Nancy Pelosi is suddenly not the best candidate” anymore? Or, is he best candidate for every democrat
literally?
• Thus, the scope here affects the “modality” further, and makes the “possibility” “opinioated” rather than a
“factual possibility”.
• The Formal Semantics Components affect each other.
• The output of the Formal Semantics affect the Lexical Semantics.
• Lexical Semantics affect the Lexical Relations.
• Lexical Relations affect the Information Graph, and Extraction.
• Information Extraction determines the Knowledge Base (Raw Knowledge Graph). Source: Providing result-based query suggestions
17. Formal Semantics and T-A-M (Tense-Aspect-
Mood)
• Tense-aspect-mood has different combinations
to extract information, and relate
lexicosemantics to each other within a data
graph.
• Tense involves the position of the action inside the
timeline.
• Past, Present, Future
• Aspect involves extension of the state of action in
timeline.
• Unitary – Happened once and suddenly.
• Continuous – Happens during the time.
• Repeated – Happened repeatedly, will happen again.
• Continuous
• Mood (modality) involves the actuality of action.
• Possibly: Might happen.
• Necessity: Should happen.
Source: Extracting Semantic Classes from Text
18. Transition from Lexical Semantics to Query
Semantics
• Query Semantics and Lexical Semantics are
different from each other but highly similar.
• Lexically synonym words might appear
irrelevant to each other, while in Query
Semantics they are relevant.
• For example, “Buy” and “Sell” are opposites, or
antonyms for each other.
• In Query Semantics, “Buy” and “Sell” are
synonyms, in other words, they mean the same
thing.
• The “Soft Drinks” is different concept than
“Coca Cola”. The “Soft Drinks” is a hypernym
for Coca Cola in Lexical Semantics, but in Query
Semantics, they might be synonyms.
19. Transition from Lexical Semantics to Query
Semantics
• Query Semantics is used for “Query
Inference”, and “Query Phrasification”.
• The Query “Best temperature for Soft
Drink” is a query for a hypernym in
Lexical Semantics.
• Query Semantics is used to generate
the same search query for other
members of the same set, because at
the same time, they are synonyms in
query semantics.
• “Soft drinks such as Coca Cola” and “Coca
Cola (Soft Drink)” doesn’t represent the
same thing in Query Semantics.
• Second phrase is more relevant to “Coca
Cola”, while the first one is more relevant
to entire “class of things”.
20. Transition from Lexical Semantics to Query
Semantics
• “Best temperature for pepsi” query
requires further query processing
with lexicosemantics and query
semantics.
• “Best temperature for pepsi” has
missing part.
• For drinking
• For serving
• For producing
• For storing
• For Mixing
• All the possible “verbs” come form
“lexical semantics” and how they are
used in “query search” language.
21. Formal Semantics and T-A-M (Tense-Aspect-
Mood)
• Formal Semantics and T-A-M affect lexical
semantics.
• The “tense”, “aspect” and “mood”
combinations create different lexical
relations with contexts.
22. Transition from Lexical Semantics to Query
Semantics
• The smallest query and word differences
can create ranking changes,
• even if search intent is same,
• or they mean same thing.
Compositionality by Henk J. Verkuyl, Utrecht University
what should happen to someone who has hemophilia
what can happen to someone who has hemophilia
what happens to someone who has hemophilia
23. Formal Semantics and T-A-M (Tense-Aspect-
Mood)
• The modality “should” represent a
responsibility, and solution for a problem.
• Thus, result focuses on “treatment” or
“precaution”, even if rest of the sentence is
same.
what should not happen to someone
who has hemophilia
what will not happen to someone who
has hemophilia
what happened to someone who has
hemophilia
24. Formal Semantics and T-A-M (Tense-Aspect-
Mood)
• The lemmatization such
as “effected”, and
“effective” bring answers
closers.
• The predicate “show” is
closer to “demonstrate”,
and “metrics”, or “tests”.
• The predicates, and
possible
compositionalities have
different types of themes.
what shows happen to someone who has hemophilia
what effected to someone who has hemophilia
33. Query Semantics
• We also see that, “Cat” and “Dog” can
be synonyms.
• Part-time and Full-time can be
synonyms.
• But, sometimes they are also not
synonyms.
• For the query “find job”, they might be
synonym.
• For the query “buy pet”, they might be
synonynm.
• But for the “dog food”, it is not synonym.
• “Sign in” and “Sign on” might be or
might not be synonym.
• “Address” might be contact, or just the
address as well.
34. Query Semantics
• New York is not York.
• York Hotels doesn’t mean New
York Hotels.
• But, Vegas is always Las Vegas.
• If you search from Latin
America, York is New York.
• If you search from Africa, still,
York is New York.
• If you search from France, it is
50/50.
• If you search from UK, it is not
New York, again.
35. Query Semantics
• “New” appears alone a lot.
• “York” appears without “New”
sometimes.
• The combination of phrases
from the Documents help
search engines to relate these
things to each other, or
differentiate them.
• How documents use the query
phrases determine how people
search.
• How people search affect how
people use query phrases.
36. Query Semantics
• Bonus: Does it worth to
index?
• Even if 1,000,000 searches
happen everyday?
• What are the synonyms of
facial expressions?
37. Query Semantics
• “Prove the cost is worth it”.
• Do you worth for that cost if
you do not use
lexicosemantics?
38. Let’s talk about “porn”.
• This is Matt Cutts.
• His first big task on Google was
“finding spammy” but sometimes
not spammy, but highly “sexual
queries”.
• Why?
• S A F E S E A R C H.
39. Let’s talk about “porn”.
• And, how to find all these porns?
• How do people search porn?
• Matt Cutts was an expert on Web
Spam, because adult websites use
spam a lot.
• “Tink two times, if your manager
asks you that what do you think
about porn.”
• -Matt Cutts
40. Let’s talk about “porn”.
• Matt Cutts used 69 languages, and
synonyms to find good phrases that can
relate porns.
• “I didn’t think about this before. People
search porn with lots of different weird
words.”
• Matt Cutts tried to convince Google
Employees to search porn with weird
ways.
• He distributed “cookies”, this is how
“Google Cookie Porn” events happened.
• Lexicosemantics and Query Semantics
are tested first time with entire Google.