Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×


Wird geladen in …3

Hier ansehen

1 von 47 Anzeige

Weitere Verwandte Inhalte

Aktuellste (20)



  1. 1. Lexical & Query Semantics Differences for Information Retrieval Why PageRank is Sometimes Better for Semantics
  2. 2. Closing the Gap between Search Query Language and Document Language • There are three components of Information Retrieval Systems. • Query Understanding • Document-Query Relevance Understanding • Document Clustering and Ranking • The path from a “search query” to a “search document” involves query parsing, processing, augmenting, scoring, ranking and clustering. • Query Understanding is where the SEO starts. • Document Creation is where the SEO continues. • Document Ranking where the SEO repeats itself. Source: Query Language Determination Using Query Terms and Interface Language
  3. 3. What is this Search Query Language? • Search Query Language is invented in Cranfield Experiments in late 1950s. • Scientists realized that while “querying a document”, the language gets densified, and words change their meaning. • There is a huge vocabulary difference between “queries” and “documents. • Because, people do not know what to ask for a search engine, they only know what represents the topic. • The “query language” uses “knowledge representation” with “dense vectors”. • Query Term Weight Calculation is born during these experiments. Source: Augmenting Queries With Synonyms From Synonyms Map
  4. 4. Query Search Language • Cranfield Experiments: Cyril W. Cleverdon is one of the first Information Retrieval experiments. • It is for testing the efficiency of indexing systems. • The “Vannevar Bush’s ‘As we may think’” paper is cited during the research. • The Cranfield Experiments invented the “Search Language” concept to admit the fact that words change their meanings inside the search queries even if they are used same inside the document. • Information Retrieval has to make a distinction between “understanding relevance” and “understanding query”. • To understand the query, search engine can’t use the language model for understanding the documents. • Document language and query language are completely different. • Inside the documents, we see “lexical semantics”. • Inside the queries, we see “query semantics” with “search language”. Source: “As We May Think” – Vannevar Bush; Cranfield Experiments, Cyril W. Cleverdon, 1958.
  5. 5. An Algorithm doesn’t have to be liked by your logic • An algorithm doesn’t have to make sense. • An algorithm has to be useful. • Cranfield Experiments is debated for decades, and still it is cited by new researches. • Cranfield Experiments do not explain why their method is working, it just tells, it works. • The experiments tell test subjects to take documents from “aerospace” topic, and write some “keywords”, or “search queries” for “aerospace” topic. • Test subjects rank the documents based on their own query terms and their own judgement. • Cranfield Experiments has created the concepts of “search language” and “document language” along with “natural language query”. Source: Query Generation Using Structural Similarity Between Documents
  6. 6. Lexical Semantics • Lexicosemantics involves word-sense disambiguation with word copositionality and language syntax-semantics interface. • Lexicosemantics helps Formal Semantics (Natural Language). • Formal Semantics studies grammatical meaning of natural language with theoretical computer science. • Lexical Semantics helps for construction of WordNets, FrameNets, Knowledge Bases and Index Tiers. • Lexical Semantics is useful for Search Engines to process a text item to understand “Semantic Scope” of sentences with “modality”, “tense”, “binding”, “aspect”, and pragmatics. • Lexical semantics involve, hyponymy, hypernymy, antonomy, homonymy, polysemy, meronymy, holonym and semantic networks. Source: Query Generation Using Structural Similarity Between Documents
  7. 7. Do You Remember Google Merge? • What if Google became a semantic search engine by buying another one? • Oingo was the first search engine focused on meaning-based relevance and advertisement. • They became “Applied Semantics” in 2001. • Google and Applied Semantics merged together on 18 April, 2003.
  8. 8. Applied Semantics (Oingo): The First Conceptual Search Engine • Applied Semantics is created by Eytan Elbaz in 1999. • Information Extraction and Information Responsiveness work differently than Information Retrieval. • Lexical Relations do not have the meaning in query terms, but Query Semantics have. Thus, to augment and expand a query, query semantics are used first time. • It is one of the first designs that mention “semantic distance”, and “relationship strength” to create a semantic network of concepts. • It created the way to “Index Tiering”. Typically, search engines match the search terms to the documents as a whole. If the user is interested in specific information, for example, “sharks”, but a particular document about “beaches around the world”, for example, only has one sentence about sharks, it is unlikely that the search engine would return the document. Documents like the one described are likely to score very low under the query for “sharks”, if at all, because the document as a whole is not “about” sharks. Source: Methods and systems for detecting and extracting information
  9. 9. Do You Remember Google Merge? • Similarity (“gluttonous” is similar to “greedy”) – Near Synonyms • Membership (“commissioner” is a member of “commission”) • Metonymy (whole/part relations) (“motor vehicle” has part “clutch pedal”) • Substance (e.g. “lumber” has substance “wood”) • Product (e.g. “Microsoft Corporation” produces “Microsoft Access”) • Attribute (“past”, “preceding” are attributes of “timing”) • Causation (e.g. travel causes displacement/motion) • Entailment (e.g. buying entails paying) • Lateral bonds (concepts closely related to one another, but not in one of the other relationships, e.g. “dog” and “dog collar”) • Capitonyms (Polish (Nation), polish (shining). • Troponym (Walking -> Hustle, Trot, Crawl) • Eponym (Tommy John Surgery, Biswanath Panda -> Panda Update) • Demonym (New Yorkers -> Population of New York City, not State) • Acronyms (NASA, North American Saxophone Alliance, National Auto Sport Association, National Association of Students of Architecture) •Source: Bill Slawski
  10. 10. Formal Semantics • Formal Semantics involves philosophy of language and linguistics together. • Denotations of natural language expressions are used to understand the compositionality of words, and their references. • Nature of meaning is the philosophical part of formal semantics. • Nature of meaning involves the meanings that come from our nature (Constructivist, Coherence, Correspondence, Consensus, Pragmatic Theories). • Formal Semantics have two approaches. • Truth Conditions • Compositionality • Formal Semantics is related to Lexical Semantics, because based on lexical relations, the compositionality, and truth conditions change.
  11. 11. Formal Semantics and Inquisitive Semantics • Inquisitive Semantics involve raising new but related issues to a truth value. • For example: “Aspirin is used against headache. Does it work against toothache?” • The “toothache” and “headache” here have lexical relations to each other as “meronyms”. • The Formal Semantics here helps to understand the truth value of “Aspirin” and its functions. • The Formal Semantics and Truth Conditions have two approaches. • Dynamic Semantics: The raised issues have to change the context, and the first premise has to be correct. • Static Semantics: The raised issue doesn’t have to be relevant, and premise doesn’t have to be true. • For example: “John gives SEO Suggestions as a Googler. Does John gives useful SEO Suggestions as a Googler?” • Technically, John’s occupation is not connected to the suggestions’ usefulness. • The Dynamic Semantics change the context of the previous sentence based on interpreter and receiver. • Multi-stage or chained reasoning is highly relevant to the Dynamic Semantics for “context direction”. Source: Multi-level Recommendation Reasoning over Knowledge Graphs with Reinforcement Learning
  12. 12. Formal Semantics and Compositionality • Compsoitionality is to understand lexical relations between the subjects and objects. • The easiest way to have a formal semantics understanding for compositionality is removing all the meaningful lexical units from the sentence. • For the sentence “Contadu is the best technology for creating a semantical understanding to optimize content”. • “C is t-t for s-u to o-c”. • The structure here gives the composition of words, and how lexical relations are constructed with constituent rules. Source: Compositionality by Henk J. Verkuyl, Utrecht University
  13. 13. Formal Semantics and Scope • Scope determines the validity of the specific declaration’s range. • Formal semantics helps machines to process the human language for understanding the specific scope. • For example: • “Every student has a favourite teacher”. -> It is not clear whether every student has the same teacher as their favourite or, all of them have different teachers as their favourite, or some of them have same teacher, and some of them have different teachers as their favourite. • “When three more votes are taken from the court, the decision will be as we want.” -> The not clear part here is that, why 3, and which 3. Does the court have different layers of officials with different vote values, or especially “X, Y, Z” officials needed to vote, and which other decision-givers are against the decision that the person wants.  This is the example of Inquisitive Semantics, use it for question generation. • There are other types of scopes, such as “scope islands”, “exceptional scopes”. Source: Context-Sensitivity and Individual Differences in the Derivation of Scalar Implicature
  14. 14. Formal Semantics and Scope • Island Scopes are called Island because they can’t be taken out of that scope (island). • For example: “If every elephant in the sanctuary gains 5 pounds every next 6 months, I will get a promotion”.  The person doesn’t get another promotion whenever an elephant gains 5 pounds for every 6 months. It happens once. • Exceptional Scope reverses the scope islands with “a” indefinite. • For example, “If an elephant gains 5 pounds, I will take a promotion”  The disambiguous, and repetitiveness occur together. • Scope is important for Compositionality, and Compositionality is important for Lexical Semantics. Source: Creation of inferred queries for use as query suggestions
  15. 15. Formal Semantics and Modality • Modaliy is part of Formal Semantics with propositional content, and philosophical logic. There are different modalities: • Permissible: Express the acts that are allowed. • Possible: Express the acts that are possible. • Quintessential: Express the acts’ features. • Evidential: Express the facts with factual source. • Habitual: Express the habits. • Iterative: Express the repeated acts. • Frequentative: Express the permanent facts. Source: Semantic frame identification with distributed word representations
  16. 16. Formal Semantics and Binding • Binding is creating a bound between the predicate and the subject. The anaphors are used to express the connections between bound predicates and subjects. • Modality express the lexical relations’ features while binding is for lexical relations’ direction. • The sentence of “Nancy Pelosi must be next presidential candidate for her career”, the “must be” involves “strong possibility” while “career” is bound to “Nancy Pelosi”. • The set theory works here to create “People who must be next candidates for presential election” set, and “being a presidential candidate” as a possible “political career improvement” act, and “presidential candidate” becomes a topic that involves connections to other types of “candidacies”, while “political career steps”, and “political discussions” are connected to it. • The binding and modality works to create an Information Graph, together. • If the sentence changes as “Nancy Pelosi is the best possible candidate for every democrat in the US.”, the sentence has a possibility from a different “modality”, and concept of “scope” works here again. • Declaration tells that “Nancy Pelosi is a candidate” for “every Democrat in the US”. This explains the “scope” and “compositionality”. • Compositionality here is “N is a c for e d in the U.S” • The main issue here is that the scope doesn’t make sense. If a Democrat goes outside of the US, does it mean that “Nancy Pelosi is suddenly not the best candidate” anymore? Or, is he best candidate for every democrat literally? • Thus, the scope here affects the “modality” further, and makes the “possibility” “opinioated” rather than a “factual possibility”. • The Formal Semantics Components affect each other. • The output of the Formal Semantics affect the Lexical Semantics. • Lexical Semantics affect the Lexical Relations. • Lexical Relations affect the Information Graph, and Extraction. • Information Extraction determines the Knowledge Base (Raw Knowledge Graph). Source: Providing result-based query suggestions
  17. 17. Formal Semantics and T-A-M (Tense-Aspect- Mood) • Tense-aspect-mood has different combinations to extract information, and relate lexicosemantics to each other within a data graph. • Tense involves the position of the action inside the timeline. • Past, Present, Future • Aspect involves extension of the state of action in timeline. • Unitary – Happened once and suddenly. • Continuous – Happens during the time. • Repeated – Happened repeatedly, will happen again. • Continuous • Mood (modality) involves the actuality of action. • Possibly: Might happen. • Necessity: Should happen. Source: Extracting Semantic Classes from Text
  18. 18. Transition from Lexical Semantics to Query Semantics • Query Semantics and Lexical Semantics are different from each other but highly similar. • Lexically synonym words might appear irrelevant to each other, while in Query Semantics they are relevant. • For example, “Buy” and “Sell” are opposites, or antonyms for each other. • In Query Semantics, “Buy” and “Sell” are synonyms, in other words, they mean the same thing. • The “Soft Drinks” is different concept than “Coca Cola”. The “Soft Drinks” is a hypernym for Coca Cola in Lexical Semantics, but in Query Semantics, they might be synonyms.
  19. 19. Transition from Lexical Semantics to Query Semantics • Query Semantics is used for “Query Inference”, and “Query Phrasification”. • The Query “Best temperature for Soft Drink” is a query for a hypernym in Lexical Semantics. • Query Semantics is used to generate the same search query for other members of the same set, because at the same time, they are synonyms in query semantics. • “Soft drinks such as Coca Cola” and “Coca Cola (Soft Drink)” doesn’t represent the same thing in Query Semantics. • Second phrase is more relevant to “Coca Cola”, while the first one is more relevant to entire “class of things”.
  20. 20. Transition from Lexical Semantics to Query Semantics • “Best temperature for pepsi” query requires further query processing with lexicosemantics and query semantics. • “Best temperature for pepsi” has missing part. • For drinking • For serving • For producing • For storing • For Mixing • All the possible “verbs” come form “lexical semantics” and how they are used in “query search” language.
  21. 21. Formal Semantics and T-A-M (Tense-Aspect- Mood) • Formal Semantics and T-A-M affect lexical semantics. • The “tense”, “aspect” and “mood” combinations create different lexical relations with contexts.
  22. 22. Transition from Lexical Semantics to Query Semantics • The smallest query and word differences can create ranking changes, • even if search intent is same, • or they mean same thing. Compositionality by Henk J. Verkuyl, Utrecht University what should happen to someone who has hemophilia what can happen to someone who has hemophilia what happens to someone who has hemophilia
  23. 23. Formal Semantics and T-A-M (Tense-Aspect- Mood) • The modality “should” represent a responsibility, and solution for a problem. • Thus, result focuses on “treatment” or “precaution”, even if rest of the sentence is same. what should not happen to someone who has hemophilia what will not happen to someone who has hemophilia what happened to someone who has hemophilia
  24. 24. Formal Semantics and T-A-M (Tense-Aspect- Mood) • The lemmatization such as “effected”, and “effective” bring answers closers. • The predicate “show” is closer to “demonstrate”, and “metrics”, or “tests”. • The predicates, and possible compositionalities have different types of themes. what shows happen to someone who has hemophilia what effected to someone who has hemophilia
  25. 25. Query-Document Vocabulary Gap
  26. 26. Query-Document Vocabulary Gap
  27. 27. Query-Document Vocabulary Gap
  28. 28. Query-Document Vocabulary Gap
  29. 29. Query-Document Vocabulary Gap
  30. 30. Query-Document Vocabulary Gap
  31. 31. Query Semantics
  32. 32. Query Semantics
  33. 33. Query Semantics • We also see that, “Cat” and “Dog” can be synonyms. • Part-time and Full-time can be synonyms. • But, sometimes they are also not synonyms. • For the query “find job”, they might be synonym. • For the query “buy pet”, they might be synonynm. • But for the “dog food”, it is not synonym. • “Sign in” and “Sign on” might be or might not be synonym. • “Address” might be contact, or just the address as well.
  34. 34. Query Semantics • New York is not York. • York Hotels doesn’t mean New York Hotels. • But, Vegas is always Las Vegas. • If you search from Latin America, York is New York. • If you search from Africa, still, York is New York. • If you search from France, it is 50/50. • If you search from UK, it is not New York, again.
  35. 35. Query Semantics • “New” appears alone a lot. • “York” appears without “New” sometimes. • The combination of phrases from the Documents help search engines to relate these things to each other, or differentiate them. • How documents use the query phrases determine how people search. • How people search affect how people use query phrases.
  36. 36. Query Semantics • Bonus: Does it worth to index? • Even if 1,000,000 searches happen everyday? • What are the synonyms of facial expressions?
  37. 37. Query Semantics • “Prove the cost is worth it”. • Do you worth for that cost if you do not use lexicosemantics?
  38. 38. Let’s talk about “porn”. • This is Matt Cutts. • His first big task on Google was “finding spammy” but sometimes not spammy, but highly “sexual queries”. • Why? • S A F E S E A R C H.
  39. 39. Let’s talk about “porn”. • And, how to find all these porns? • How do people search porn? • Matt Cutts was an expert on Web Spam, because adult websites use spam a lot. • “Tink two times, if your manager asks you that what do you think about porn.” • -Matt Cutts
  40. 40. Let’s talk about “porn”. • Matt Cutts used 69 languages, and synonyms to find good phrases that can relate porns. • “I didn’t think about this before. People search porn with lots of different weird words.” • Matt Cutts tried to convince Google Employees to search porn with weird ways. • He distributed “cookies”, this is how “Google Cookie Porn” events happened. • Lexicosemantics and Query Semantics are tested first time with entire Google.
  41. 41. Some Case Studies http://ktg.digital/Holistic-SEO-20 Kanbanize.com
  42. 42. Some Case Studies http://ktg.digital/Holistic-SEO-20 TheCooList.com
  43. 43. Some Case Studies http://ktg.digital/Holistic-SEO-20 TheComplaintsBoard.com
  44. 44. Some Case Studies http://ktg.digital/Holistic-SEO-20 Vava.cars
  45. 45. Some Case Studies http://ktg.digital/Holistic-SEO-20 Diyetkolik.com
  46. 46. Some Case Studies http://ktg.digital/Holistic-SEO-20 NDA.
  47. 47. Some Case Studies http://ktg.digital/Holistic-SEO-20 K9web.com