The need for more sophisticated search implementations is often at odds with the limited feature set available in modern out of the box open source search engines.
This presentation discusses the challenges associated with properly modeling information within a domain and why it's critically needed.
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
Trey Grainger discussed how search has evolved from basic keyword search to more advanced capabilities like understanding user intent, providing personalized search, and augmented search using machine learning and AI. He explained the concept of "reflected intelligence" where user interactions with search results are used to continuously improve search quality through techniques like signals boosting, learning to rank, and collaborative filtering. Grainger also outlined how knowledge graphs can help power semantic search by modeling relationships between entities to better understand queries and provide more relevant results.
The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data and Graph technologies to reinvent knowledge management inside organizations. This platform aims to organize and distribute the organization’s knowledge, and making it centralized and universally accessible to every employee. The Enterprise Knowledge Graph is a central place to structure, simplify and connect the knowledge of an organization. By removing complexity, the knowledge graph brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to share knowledge and to make decisions based on comprehensive knowledge. This platform can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
Smartlogic provides semantic search and content intelligence solutions to unlock business value from unstructured content. Their solution, Semaphore, uses natural language processing and machine learning to automatically enrich content with metadata, extract entities and facts, and categorize content according to customizable semantic models or ontologies. This helps organizations more effectively search, discover, and leverage information across diverse content sources. Semaphore delivers enhanced search capabilities, automated categorization, and tools to build and manage semantic models collaboratively. Customers report benefits such as reduced time spent searching, lower classification costs, and reduced risk of non-compliance by making more information accessible.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
Trey Grainger discussed how search has evolved from basic keyword search to more advanced capabilities like understanding user intent, providing personalized search, and augmented search using machine learning and AI. He explained the concept of "reflected intelligence" where user interactions with search results are used to continuously improve search quality through techniques like signals boosting, learning to rank, and collaborative filtering. Grainger also outlined how knowledge graphs can help power semantic search by modeling relationships between entities to better understand queries and provide more relevant results.
The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data and Graph technologies to reinvent knowledge management inside organizations. This platform aims to organize and distribute the organization’s knowledge, and making it centralized and universally accessible to every employee. The Enterprise Knowledge Graph is a central place to structure, simplify and connect the knowledge of an organization. By removing complexity, the knowledge graph brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to share knowledge and to make decisions based on comprehensive knowledge. This platform can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
Smartlogic provides semantic search and content intelligence solutions to unlock business value from unstructured content. Their solution, Semaphore, uses natural language processing and machine learning to automatically enrich content with metadata, extract entities and facts, and categorize content according to customizable semantic models or ontologies. This helps organizations more effectively search, discover, and leverage information across diverse content sources. Semaphore delivers enhanced search capabilities, automated categorization, and tools to build and manage semantic models collaboratively. Customers report benefits such as reduced time spent searching, lower classification costs, and reduced risk of non-compliance by making more information accessible.
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
To optimally interpret most natural language queries, its important to understand a highly-nuanced, contextual interpretation of the domain-specific phrases, entities, commands, and relationships represented or implied within the search and within your domain.
In this talk, we'll walk through such a search system powered by Solr's Text Tagger and Semantic Knowledge graph. We'll have fun with some of the more search-centric use cases of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "best bbq near activate" into:
{!func}mul(min(popularity,1),100) bbq^0.91032 ribs^0.65674 brisket^0.63386 doc_type:"restaurant" {!geofilt d=50 sfield="coordinates_pt" pt="38.916120,-77.045220"}
We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding like this within your search engine.
The Next Generation of AI-powered SearchTrey Grainger
What does it really mean to deliver an "AI-powered Search" solution? In this talk, we’ll bring clarity to this topic, showing you how to marry the art of the possible with the real-world challenges involved in understanding your content, your users, and your domain. We'll dive into emerging trends in AI-powered Search, as well as many of the stumbling blocks found in even the most advanced AI and Search applications, showing how to proactively plan for and avoid them. We'll walk through the various uses of reflected intelligence and feedback loops for continuous learning from user behavioral signals and content updates, also covering the increasing importance of virtual assistants and personalized search use cases found within the intersection of traditional search and recommendation engines. Our goal will be to provide a baseline of mainstream AI-powered Search capabilities available today, and to paint a picture of what we can all expect just on the horizon.
"Searching for Meaning: The Hidden Structure in Unstructured Data". Presentation by Trey Grainger at the Southern Data Science Conference (SDSC) 2018. Covers linguistic theory, application in search and information retrieval, and knowledge graph and ontology learning methods for automatically deriving contextualized meaning from unstructured (free text) content.
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain.
In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into
{ filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } }
We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
The document discusses natural language search using knowledge graphs. It provides an overview of knowledge graphs and how they can help with natural language search. Specifically, it discusses how knowledge graphs can represent relationships and semantics in unstructured text. It also describes how semantic knowledge graphs are generated in Solr and how they can be used for tasks like query understanding, expansion and disambiguation.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
The document discusses current and upcoming trends in search and AI. It notes that large datasets are less important than actionable intelligence. Assistive search using personalization, voice, images, conversations, context and providing answers and actions rather than just links is the new paradigm. The future of search and AI involves driving relevant interactions and experiences for customers through digital moments.
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
Presented at Open Source Connections Haystack Relevance Conference on 904Labs' "Interleaving: from Evaluation to Self-Learning". 904Labs is the first to commercialize "Online Learning to Rank" as a state-of-art for technical Self-learning Search Ranking that automatically takes into account your customers human behaviors for personalized search results.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
The document discusses how search tools have evolved to meet changing information needs. It notes that while Google is a popular search engine, understanding different types of search tools is important. The document categorizes search engines based on what they index and how queries are processed. It also evaluates search engine performance using metrics like recall and precision. Finally, it deconstructs the search experience and considers factors of various search engines like Google.
Text analytics is used to extract structured data from unstructured text sources like social media posts, reviews, emails and call center notes. It involves acquiring and preparing text data, processing and analyzing it using algorithms like decision trees, naive bayes, support vector machines and k-nearest neighbors to extract terms, entities, concepts and sentiment. The results are then visualized to support data-driven decision making for applications like measuring customer opinions and providing search capabilities. Popular tools for text analytics include RapidMiner, KNIME, SPSS and R.
Haystack- Learning to rank in an hourly job market Xun Wang
The document discusses learning to rank models for job search rankings on an hourly job marketplace platform. It describes:
1) The complexity of matching job seekers to job postings given the many factors involved and limited historical data.
2) An iterative process of developing learning to rank models, testing improvements through A/B testing, and analyzing results to further tune the models over time.
3) Key factors considered in the models include job title/description matches, employer name, location matches, distance between seeker and job, and search/user attributes. Performance is evaluated on multiple metrics like application and conversion rates.
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Balancing the Dimensions of User IntentTrey Grainger
The document discusses various approaches to AI-powered search, including content understanding through keyword search, user understanding through collaborative recommendations, and combining the two through personalized search. It then covers domain understanding using knowledge graphs, combining domain and user understanding through domain-aware matching, and combining content and domain understanding through semantic search. Finally, it discusses balancing keyword, vector, and knowledge graph search approaches.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
This document is a project report submitted by four students - Anil Shrestha, Bijay Sahani, Bimal Shrestha, and Deshbhakta Khanal - to the Department of Electronics and Computer Engineering at Tribhuvan University in partial fulfillment of the requirements for a Bachelor's degree in Computer Engineering. The report details the development of a web application called "Tweezer" to perform sentiment analysis on tweets in order to determine public sentiment towards various products, services, or personalities. Literature on previous work related to sentiment analysis, especially on social media data like tweets, is also reviewed in the report.
Python for Data Science - Python Brasil 11 (2015)Gabriel Moreira
This talk demonstrate a complete Data Science process, involving Obtaining, Scrubbing, Exploring, Modeling and Interpreting data using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn.
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
Smartlogic provides semantic search and content intelligence solutions to unlock business value from unstructured content. Their software, Semaphore, uses natural language processing and machine learning to build ontologies and automatically annotate content with metadata, enabling more sophisticated search and discovery of hidden knowledge within large volumes of documents. Semaphore integrates with various systems and delivers benefits such as cost savings from more efficient content exploration, risk reduction through improved compliance, and competitive advantages from making better use of organizational intelligence in content.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
Searching the all-time growing amount of global data and research results and retrieving only the relevant and up-to date information becomes more and more challenging. The amount of data including the big data issue in the IoT world makes it even more challenging. How can an employee keeping himself up to date and include the relevant information into his work and ensure his work includes the most relevant and latest information. Most search engines today provide some sort of semantic based answers to the queries you enter into the system. However, most search engines do not know you well enough to provide you with the best answers based on who you are, and what you really want for an answer. Here is today's challenge combined with the growing amount of data and media you find it in. The answer might be closer than you think.
- The document discusses an internship report on iOS technology. The intern installed Xcode 6.4 and learned Objective-C programming. They built an iOS application using Xcode and gathered requirements from the design team. They also worked on product documentation.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Search engines are designed to help users find information stored digitally. They aim to minimize the time and amount of information needed to find what users are looking for. Major methods of information retrieval for search engines include Boolean, vector space model, probabilistic, and meta search. Designing the perfect search engine requires dealing with challenges like the web's huge and constantly changing document set that is loosely organized through hyperlinks. Effective search requires components like crawlers to discover pages, repositories to store them, indexes for efficient searching, and ranking algorithms to order results.
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
The document discusses natural language search using knowledge graphs. It provides an overview of knowledge graphs and how they can help with natural language search. Specifically, it discusses how knowledge graphs can represent relationships and semantics in unstructured text. It also describes how semantic knowledge graphs are generated in Solr and how they can be used for tasks like query understanding, expansion and disambiguation.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
The document discusses current and upcoming trends in search and AI. It notes that large datasets are less important than actionable intelligence. Assistive search using personalization, voice, images, conversations, context and providing answers and actions rather than just links is the new paradigm. The future of search and AI involves driving relevant interactions and experiences for customers through digital moments.
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
Presented at Open Source Connections Haystack Relevance Conference on 904Labs' "Interleaving: from Evaluation to Self-Learning". 904Labs is the first to commercialize "Online Learning to Rank" as a state-of-art for technical Self-learning Search Ranking that automatically takes into account your customers human behaviors for personalized search results.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
The document discusses how search tools have evolved to meet changing information needs. It notes that while Google is a popular search engine, understanding different types of search tools is important. The document categorizes search engines based on what they index and how queries are processed. It also evaluates search engine performance using metrics like recall and precision. Finally, it deconstructs the search experience and considers factors of various search engines like Google.
Text analytics is used to extract structured data from unstructured text sources like social media posts, reviews, emails and call center notes. It involves acquiring and preparing text data, processing and analyzing it using algorithms like decision trees, naive bayes, support vector machines and k-nearest neighbors to extract terms, entities, concepts and sentiment. The results are then visualized to support data-driven decision making for applications like measuring customer opinions and providing search capabilities. Popular tools for text analytics include RapidMiner, KNIME, SPSS and R.
Haystack- Learning to rank in an hourly job market Xun Wang
The document discusses learning to rank models for job search rankings on an hourly job marketplace platform. It describes:
1) The complexity of matching job seekers to job postings given the many factors involved and limited historical data.
2) An iterative process of developing learning to rank models, testing improvements through A/B testing, and analyzing results to further tune the models over time.
3) Key factors considered in the models include job title/description matches, employer name, location matches, distance between seeker and job, and search/user attributes. Performance is evaluated on multiple metrics like application and conversion rates.
The Apache Solr Semantic Knowledge GraphTrey Grainger
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for "data science", return me back results like "machine learning", "predictive modeling", "artificial neural networks", etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does "driver" mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we'll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
While traditional keyword search is still useful, pure text-based keyword matching is quickly becoming obsolete; today, it is a necessary but not sufficient tool for delivering relevant results and intelligent search experiences.
In this talk, we'll cover some of the emerging trends in AI-powered search, including the use of thought vectors (multi-level vector embeddings) and semantic knowledge graphs to contextually interpret and conceptualize queries. We'll walk through some live query interpretation demos to demonstrate the power that can be delivered through these semantic search techniques leveraging auto-generated knowledge graphs learned from your content and user interactions.
Balancing the Dimensions of User IntentTrey Grainger
The document discusses various approaches to AI-powered search, including content understanding through keyword search, user understanding through collaborative recommendations, and combining the two through personalized search. It then covers domain understanding using knowledge graphs, combining domain and user understanding through domain-aware matching, and combining content and domain understanding through semantic search. Finally, it discusses balancing keyword, vector, and knowledge graph search approaches.
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will describe how to overcome this by leveraging Lucene/Solr to power a knowledge graph that can extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships. For example, if a user types in (Senior Java Developer Portland, OR Hadoop), you or I know that the term “senior” designates an experience level, that “java developer” is a job title related to “software engineering”, that “portland, or” is a city with a specific geographical boundary, and that “hadoop” is a technology related to terms like “hbase”, “hive”, and “map/reduce”. Out of the box, however, most search engines just parse this query as text:((senior AND java AND developer AND portland) OR (hadoop)), which is not at all what the user intended. We will discuss how to train the search engine to parse the query into this intended understanding, and how to reflect this understanding to the end user to provide an insightful, augmented search experience. Topics: Semantic Search, Finite State Transducers, Probabilistic Parsing, Bayes Theorem, Augmented Search, Recommendations, NLP, Knowledge Graphs
This document is a project report submitted by four students - Anil Shrestha, Bijay Sahani, Bimal Shrestha, and Deshbhakta Khanal - to the Department of Electronics and Computer Engineering at Tribhuvan University in partial fulfillment of the requirements for a Bachelor's degree in Computer Engineering. The report details the development of a web application called "Tweezer" to perform sentiment analysis on tweets in order to determine public sentiment towards various products, services, or personalities. Literature on previous work related to sentiment analysis, especially on social media data like tweets, is also reviewed in the report.
Python for Data Science - Python Brasil 11 (2015)Gabriel Moreira
This talk demonstrate a complete Data Science process, involving Obtaining, Scrubbing, Exploring, Modeling and Interpreting data using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn.
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
Smartlogic provides semantic search and content intelligence solutions to unlock business value from unstructured content. Their software, Semaphore, uses natural language processing and machine learning to build ontologies and automatically annotate content with metadata, enabling more sophisticated search and discovery of hidden knowledge within large volumes of documents. Semaphore integrates with various systems and delivers benefits such as cost savings from more efficient content exploration, risk reduction through improved compliance, and competitive advantages from making better use of organizational intelligence in content.
Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
Searching the all-time growing amount of global data and research results and retrieving only the relevant and up-to date information becomes more and more challenging. The amount of data including the big data issue in the IoT world makes it even more challenging. How can an employee keeping himself up to date and include the relevant information into his work and ensure his work includes the most relevant and latest information. Most search engines today provide some sort of semantic based answers to the queries you enter into the system. However, most search engines do not know you well enough to provide you with the best answers based on who you are, and what you really want for an answer. Here is today's challenge combined with the growing amount of data and media you find it in. The answer might be closer than you think.
- The document discusses an internship report on iOS technology. The intern installed Xcode 6.4 and learned Objective-C programming. They built an iOS application using Xcode and gathered requirements from the design team. They also worked on product documentation.
The document discusses the emergence of the semantic web, which aims to make data on the web more interconnected and machine-readable. It describes Tim Berners-Lee's vision of a "Giant Global Graph" that connects all web documents based on what they are about rather than just linking documents. This would allow user data and profiles to be seamlessly shared across different sites without having to re-enter the same information. The semantic web uses standards like RDF, RDFS and OWL to represent relationships between data in a graph structure and enable automated reasoning. Several companies are working to build applications that take advantage of this interconnected semantic data.
Search engines are designed to help users find information stored digitally. They aim to minimize the time and amount of information needed to find what users are looking for. Major methods of information retrieval for search engines include Boolean, vector space model, probabilistic, and meta search. Designing the perfect search engine requires dealing with challenges like the web's huge and constantly changing document set that is loosely organized through hyperlinks. Effective search requires components like crawlers to discover pages, repositories to store them, indexes for efficient searching, and ranking algorithms to order results.
This document discusses using retrieval augmented generation (RAG) with Cosmos DB and large language models (LLMs) to power question answering applications. RAG combines information retrieval over stored data with text generation from LLMs to provide customized, up-to-date responses without requiring expensive model retraining. The key components of RAG include data storage, embedding models to index data, a vector database to store embeddings, retrieval of relevant embeddings, and an LLM orchestrator to generate responses using retrieved information as context. Azure Cosmos DB is highlighted as an effective vector database option for RAG applications.
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Cataldo Musto
This document provides an overview and agenda for a tutorial on semantics-aware techniques for social media analysis, user modeling, and recommender systems. The tutorial will discuss how to represent content to improve information access and build new services for social media. It will cover why intelligent information access is needed to effectively cope with information overload, and how semantics can be introduced through natural language processing and by encoding endogenous and exogenous semantics. The agenda includes explaining recommendations, semantic user profiles based on social data, and semantic analysis of social streams.
This is a presentation I delivered at Enterprise Data World 2018 to make the case for developing intelligent systems using a hybrid or blended approach combining statistical-based machine learning with knowledge-based approaches that involve ontologies, taxonomies or knowledge graphs.
This document discusses query entity recognition (QER), which seeks to locate and classify elements in text queries into predefined categories like names, organizations, locations, etc. It describes challenges like differentiating similar entities and balancing free text for training. The document outlines approaches to QER like string matching, probabilistic shallow parsing using conditional random fields, and a hybrid method. It provides details on the features of the QER system, such as processing speed, integration formats, and evaluation metrics. Future directions are mentioned, like expanding QER into a complete query dynamics system.
The document describes a proposed patent search system that aims to improve the usability of patent searches. It discusses modules for login, query processing, error correction, query suggestion, ranking results, and partitioning patents. The goal is to make the search process easier for users by correcting errors, expanding queries, and efficiently retrieving the most relevant results. Key techniques include topic modeling for suggestions, error correction using tries, and partitioning patents into groups for faster searching.
his talk will feature some of my recent research into the alternative uses for Solr facets and facet metadata. I will develop the idea that facets can be used to discover similarities between items and attributes in a search index, and show some interesting applications of this idea. A common takeaway is that using facets and facet metadata in non-conventional ways enables the semantic context of a query to be automatically tuned. This has important implications for user-centric and semantically focused relevance.
The document discusses map reduce and how it can be used for sequential web access-based recommendation systems. It explains that map reduce separates large, unstructured data processing from computation, allowing it to run efficiently on many machines. A map reduce job could process web server logs to build a pattern tree for recommendations, with the tree continuously updated from new data. When making recommendations for a user, their access pattern would be compared to the tree generated from all user data.
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
Abstract – It is a research venture on the new information-access standard called type-ahead search, in which systems discover responds to a keyword query on-the-fly as users type in the uncertainty. In this paper we learn how to support fuzzy type-ahead search in XML. Underneath fuzzy search is important when users have limited knowledge about the exact representation of the entities they are looking for, such as people records in an online directory. We have developed and deployed several such systems, some of which have been used by many people on a daily basis. The systems received overwhelmingly positive feedbacks from users due to their friendly interfaces with the fuzzy-search feature. We describe the design and implementation of the systems, and demonstrate several such systems. We show that our efficient techniques can indeed allow this search paradigm to scale on large amounts of data.
Index Terms - type-ahead, large data set, server side, online directory, search technique.
Object-oriented analysis and design is an evolutionary development method built upon past proven concepts. The document discusses object-oriented systems development processes including use case driven analysis, the Object Modeling Technique (OMT), class diagrams, relationships between classes, and object-oriented modeling. It provides examples of class diagrams showing classes, attributes, operations, and relationships. It also explains the four views of OMT - the object model, dynamic model, functional model, and how OMT separates modeling.
I was invited to speak at OMCap Berlin 2014 about the close relationship between search engines and user experience with prescriptive guidance to gain higher rankings and more conversions.
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling. Then, we present a case of how we've combined those techniques to build Smart Canvas, a SaaS that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to a content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Short-Bio: Gabriel Moreira is a scientist passionate about solving problems with data. He is Head of Machine Learning at CI&T and Doctoral student at Instituto Tecnológico de Aeronáutica - ITA. where he has also got his Masters on Science. His current research interests are recommender systems and deep learning.
https://www.meetup.com/pt-BR/machine-learning-big-data-engenharia/events/239037949/
Ähnlich wie The need for sophistication in modern search engine implementations (20)
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
5. If wealth is knowledge, knowledge about
our domain, and the knowledge to model
it accurately could be said to be “value”.
”Value” is the proposition that drives
users to engage with search
6. In computer graphics, 3d models are simplified for real-
time applications (video games).
Fidelity is preserved by applying a high-fidelity proxy to
the lower-fidelity “real-time” representation.
This process is called “baking”.
7. In machine learning, when we ‘train’ a model, we are
‘baking’ knowledge into a more efficient representation.
The same is true for how we might enhance searches
by using external datasets, query statistics, LTR, etc.
Modeling a high-fidelity representation of data into a
real-time, more efficient form is key to climbing the
ladder of search sophistication.
8. Representing domain knowledge within our search
platform so that it provides value to our users is how
we achieve sophistication.
This is perhaps the greatest challenge in building
search products.
9. Our premise
Intent IS accuracy, recall IS relevancy
This may be controversial; recall vs accuracy is the wrong juxtaposition.
10. Our premise
Perhaps the best way this relationship can be described is:
- The fidelity of a domain model impacts recall
- Accuracy is linked to our domain model
- Relevancy is linked to accuracy
- Accuracy is best modeled by understanding intent
- Restrictive queries shouldn’t be presumed to be accurate.
Accuracy exists independent of the percent of documents matched.
11. Our premise
If accuracy is the ultimate goal, and ‘recall’ is apart of accuracy, how do we
go about achieving this?
14. Modeling Knowledge
Let’s return to discussing sophistication. Before we made the claim that knowledge is
what provides value. We also said that modeling knowledge is difficult.
Implementing maturity in our search platform is what allows us to model our domain
knowledge.
16. Modeling Knowledge
Observation… It’s really hard for most organizations to climb the sophistication ladder
that was shown in the previous slide.
17. Out of the box
Scorer (default similarity)
Query Handler (Edismax)
Import Handlers
Analyzers / TokenFilters
Boost Functions
18. What we need
Query Classifiers
ML Models
Behavior Sampling / Ingestion
Identity Awareness
Secondary Data Sources (data connectors)
Alternative forms of storage (inverted index)
Integrations (Spark, Airflow, etc)
Collections as “Containers” for behavior.
19. Modeling Knowledge
When we model our domain we want to model “things”, that we
can call “entities”.
Modeling entities in any domain can be extremely valuable.
21. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
22. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
23. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
24. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
5.) must be pruned by user feedback / behavior
25. Modeling Knowledge - Entities
1.) disambiguation for free
2.) fairly easy to generate candidates for any domain
3.) fairly well researched area of ML
4.) helps in the modeling of “conceptual” synonyms
5.) must be pruned by user feedback / behavior
6.) ground work for higher-level more sophisticated features.
28. Modeling Knowledge - Entities
Ok, but why ?
In the previous slide we saw that 40% of Target Corporations searches are low-information, and they
don’t know what they mean. Without modeling your corpus (the content you are searching) you
won’t be able to reason about the behavior or relationship between searches, actions, and ultimately
intent.
It is extremely common for a good portion of searches (half) to not provided the necessary
information to give relevant term-based search results.
This is at the core of the case for sophistication. Term search simply can’t provide useful results for
a large number of searches that your users are going to perform.
30. Modeling Knowledge – Truth Systems
entity feature value
plato isA philosopher
socrates isA philosopher
plato knew socrates
socrates knew plato
plato isA historical-figure
socrates isA historical-figure
31. Modeling Knowledge - Similarity
Socrates != Plato
- Related, but not the same
- One is not a subset of the other
- Found in many of the same documents
- Found in many of the same contexts
- This is where automatic similarity methods, fall down a bit.
32. Modeling Knowledge - Ontologies
Entities and ontologies can work together...
- when building ontologies there are different types of relationships.
- word2vec / phrase2vec, LSA, cannot be used by themselves.
- ontologies can be pruned and reshaped by supervised learning.
- ontologies can be reshaped by feature-systems (truth systems).
- most useful ontologies are modeled for a specific feature (product titles).
- query classifier can choose between similarity features / models.
34. Modeling Knowledge
We can’t simply rely on our corpus to provide us with the information necessary to model
our domain. We must use auxiliary data sources.
Fortunately there are many open data sources in the world that we can use to augment
our understanding of our corpus.
36. Modeling Knowledge - Entities
Entities and ontologies can work together...
- when building ontologies there are different types of relationships.
- word2vec / phrase2vec cannot be used by themselves
- ontologies can be pruned and reshaped by supervised learning
- ontologies can be reshaped by feature-systems (truth systems)
38. In the previous slides we saw entity mapping and grading of a job-search domain model. This
was accomplished by building candidate phrases and then pruning them using an SVM trained
from features from a known good data source with phrases and topics already labeled.
Also shown was a query classifier that takes a lazy or poorly constructed query, groups the
components of the query logically and expands part of the query based on information it knows
about the index and availability and relatedness of terms.
A model to classify queries can be built by understanding the relationship between search
entities, and the entities and information contained within a document.
39. SHReC is a Java package implementing a hierarchical document clustering algorithm based on a
statistical co-occurence measure called subsumption.
The algorithm is particularly suited to the problem of on-line "search results" clustering, requiring little
amounts of text data. - http://shrec.sourceforge.net/
Search Action Document
SHReC along with an entity model can be used to prune, grade, and reorganize an ontology to better
understand the types and accuracy of relationships. Algorithms used to cluster behavior with search
terms are invaluable in modeling search intent and rewriting search queries.
40. The perfect combination of phrase
boosting, multi-term synonyms, term
position (proximity) and performance is
a frequent question within the
community.
41. Exact Phrase Matches → PhraseQuery / SpanQuery
Proximity of Terms → SpanQuery
Related Phrases → Payloads / Index Time Synonyms
42. Currently in Solr there is no built-in way to represent related entities efficiently. Query rewriting or
expansion can be performed at query time, but not all relationships can be modeled at query time
due to the complexity of the query.
Different classifications of synonym within the index are an option, as well as payloads being used to
assign relatedness scores to a given entity.
All index-side synonym solutions are quite custom and are not easy to quickly implement.
Better tools are needed to correctly model graphs of terms or entities and to create rules for how and
when to rewrite search queries without using crude rule based systems.
44. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
45. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
- Phrases form the basis of relationships.
46. Conclusion
- Modeling the world through language is hard.
- Modeling phrases and entities makes life a little easier.
- Phrases form the basis of relationships.
- Accuracy should be proportional to confidence
Hinweis der Redaktion
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
With that said,
There is a relationship between recall and accuracy, that’s not up for debate.
What is often missed in the discussion of recall and its relationship to intent.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
Intent is domain specific, so we want to find easier ways to model it
When we model intent we want to do all the things you do with search, test it, debug it, update it, reinforce it with judgements.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
To have “expanded” recall we must model our domain, to do this we need entities and to understand the relationship between them.
Solr, Elastic Search provide good primitives out of the box
Demands of modern search applications require more layered and sophisticated primitives
So I’ve talked about modeling entities, but why do we need to do this… can’t be get most of the way there with traditional search, hasn’t it worked fine for most people until now?
This is a slide from Target corp presentation
A lot of their searches are long-tail with poor matches
Also, when their model can’t match both words within a given category, they fall-back to search for only 1 term that has the most matches.
This is a great example where understanding the relationship between searches and entities can be very important.
So I’ve talked about modeling entities, but why do we need to do this… can’t be get most of the way there with traditional search, hasn’t it worked fine for most people until now?
Single term searches are a huge issue in job search, and interpreting what they mean is a challenge.
Modeling entities in our domain helps us begin to understand the relationship between term searches and the types of documents viewed.
Entity approaches can also help us understand the individual searchers affinity within ambiguous contexts.
You can imagine a search system in which we are modeling people. This might be useful for a library or research system.
Which brings us to traditional challenges with similarity and where it can go off the rails.
Ontologies are a huge subject, but really what we are describing is a graph database with edges that are informed based on what we know about a particular entity.
One entity may be perfectly related to another.
If we were building a library data-system we might have an ontology of historical persons.
The ontology might form features to tell us if the person was an inventor or politician.
Ontologies are a huge subject, but really what we are describing is a graph database with edges that are informed based on what we know about a particular entity.
One entity may be perfectly related to another.
If we were building a library data-system we might have an ontology of historical persons.
The ontology might form features to tell us if the person was an inventor or politician.
A simple ontology can be constructed for job titles from query logs and reviewed / pruned by hand.
Supervised ML approaches can get you pretty close to this as well.
Conceptual relationships from phrases are easier to model than words or simpler language
- accuracy or broadness of a search query should be related to how well we understand what’s being searched for
- In the absence of high confidence search should fall back to a default algorithm.