Keine Notizen für die Folie
Language causes problems for searchers With Boolean searching users need a lot of “qualifiers’ -- THIS and NOT THIS as well as proximity operators – ALL of which try to focus in on the real concept These are examples of this simple problem that causes a searcher a lot of headache The Yellow box gives additional common problems
Semantic searching is also called Concept searching It helps the user by searching on related concepts that may not include the key words used in the actual search. As you know, patents are frequently written in a way that intentionally obfuscates the actual technology being patented . Thus, semantic searching is of high interest to this community . The technical phrase for “real” semantic searching is: latent semantic analysis or LSA . Essentially, you feed the computer tens of thousands or millions of documents and the computer creates a relationship between every word in every document. As it does so, it begins to create statistically significant connections between words and phrases (concepts) so that a user’s keywords can be expanded beyond their literal spelling to include the actual concept of interest to the searcher . Many vendors claim to have semantic searching but nearly always they’re stretching the definition of the term to include older search techniques that might be based on dictionaries, thesauri, lexicons and taxonomies. Real semantic searching does not.
Here’s an example of what I mean . As I said, the computer creates these relationships and creates what is called a “vector space.” [Just as a matter of trivia, a vector space can be of any number of dimensions. Some true semantic search engines only have a few dimensions . Our semantic search engine will have in excess of 300 dimensions . ] Here is an example of the hyperspace around the word “marine.” As it built the hyperspace you notice that it recognized a close relationship between the words naval, Quantico, Corp and USMC. Similarly, but somewhat distance from that set of terms, it noticed a tight relationship between the words ecology and coastal and oceanography and oceanic . The result is that the computer can identify what concept you are talking about based on all of the other words and ways that people have discussed that same concept .
Well, there are challenges to semantic searching. In short, all attempts at providing semantic search to the marketplace thus far have treated it as a “black box” that requires the user to surrender their transparency and control over the actual search. In addition, assuming for the moment that the user can get over the black box issues, the fact that the content they need to search is stored in so many places, using so many indices, some of which are under their control and some of which will never be under their control, the issue of scale becomes real. Most truly semantic engines on the market today require the vendor to host, index and perform the search on the content based on their semantic technology . This is clearly untenable In today’s growing information market. The result is that, as much as users would like to use semantic searching, they have not seen a practical implementation to date.