The Ultimate Guide to Choosing WordPress Pros and Cons
Tovek Presentation by Livio Costantini
1. Livio Costantini Tovek’s Tools Software to Access Unstructured Information Auhofstrasse 25/2 1130 Wien E-mail: Livio.Costantini@Gmail.com Tel. 0043-1-8794274 Mobile: 0043-664-9919154
2.
3.
4.
5. The Data Retrieval and Document Retrieval Models All the most prominent of the differences arise from the more fundamental problem of the representation of the indeterminacy The representation of the indeterminacy is a result of the effects of semantic ambiguity and system (“corpus”) size. The differences influence their design, use and management. Semantic ambiguity is a measure of the number of different senses a “word and/or phase” has. System (corpus) size is the number of time that a given “word and/or phase” is used to represent an item of information .
6.
7.
8.
9.
10.
11.
12. The Problem The 80 % of information is unstructured textual documents - Imagine it as an iceberg !!! If you can see the whole, you can become frustrated by inability to see, what can be inside. Using standard tools and basic search engines (or not using them at all) you can find only the proverbial top of the iceberg.
13.
14.
15.
16.
17.
18.
19.
20. Design a Topic Tree - Knowledge Elicitation Process Extracting knowledge from subject area experts Subject Area Expert Knowledge Engineer The Knowledge Engineer extracts and organizes the knowledge of the Subject-Area Expert and expresses it in a hierarchic format which can be used in a “Topic Tree” environment.
21.
22. The Importance of Topic Trees Corporate intellectual property to be reused by employees, or business rules Topic Trees are available to end users as a shared resource. Topic Trees provide a convenient means which can encapsulate in a hierarchical structure the expert’s knowledge Topic Trees include all the components of the Verity Query Language (Conceptual and Proximity Operators, Modifiers and Weights) Topic Trees have the ability to understand the context of a text and retrieve documents related to a ”topic” of interest
23. The Accrue operator performs “the more the better” approach when assign to a topic or to a search; the more children specified by a topic using the accrue operator are found in the document, the better the document is considered related to your search. Documents which contain the maximum of highly-weighted children are the highest-ranked documents lists in the result list . Topic tree - Accrue Operator
24. Topic tree – Sentence; Any; Word; Stem; Operators Word operator performs the basic search and selects documents that include one or more instance of the exact word specified as search element. Stem operator increases the search to include the expanded word list, based on the original search word. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem Sentence operator is used to indicate that the children of a sub-topic must be located within the same sentence in a document Any operator is used to retrieve a document which contains at least one of the search elements specified.
25. Topic Trees – A knowledge representation of “Ferrari Concept ” Topic trees are predefined query in tree-like form that can be utilized for Searching, Mining and Taxonomy Classification
26.
27. Topic tree - Quality Assurance procedures and Testing process Quality Assurance Enrich the original key words Proximity operator Key words used too general Thoughts have to be made whether same keywords should be eliminated or used with new or more restrictive proximity conditions Excessively restrictive proximity conditions that did not allow combinations of keywords to contribute to the retrieval of the document in the manner expected Retrieved reports are examined for words that may serve as new keywords. Procedures to check the performance of the topic trees against a “representative” collection of reports, amongst which the reports dealing with the concepts covered by the topic trees have been identified in advance. Measuring Retrieval Effectiveness - Precision & Recall
28.
29.
30.
31.
32.
33. Tovek Agent - User Interface – Automatic Clustering Ability to create hierarchy of collections, which can be used individually or concatenated
34. Tovek Agent Selecting Collections - Find documents that satisfy specific criteria e.g. Nuclear , test Documents fields or Metadata Selecting collections Documents found Total documents Result List Search Pane & Search Elements
35.
36.
37.
38. Examine the matched words (highlighted) in the selected document Tovek Agent - Document Proprieties
39. Capacity to extract highlighted words from selected documents, together with words adjacent (preceding or following) to the highlighted ones. Tovelk Agent Extract adjacent words Search Criteria : President
43. Tovek Query Editor For advanced users to construct more complex queries to create topic trees
44. InfoRating Provide a context analysis by matching an extracted list of documents against a set of queries Documents in the results list can be visualized in multiple ways InfoRating is an analytical and data visualization tool to be able to assist users in performing context analysis together with a graphical representation of aggregate documents Information are presented graphically in ways that make it easy to observe trends and general characteristic Organize documents by the criteria and categories the user has requested, the conclusions are then delivered the user Categorize documents into navigable structures to assist user in finding relevant information and in understanding the context of a collection
45. Connection Chart Relationships between queries and documents, together with their scores Possibility to add comments to the queries and/or documents Switches for the main pane Query pane Main pane Documents pane
46. Cross Matrix Upper panel - Number of documents matching all the possible permutations of two queries Lower panel – Documents matching the selected element of the Cross Matrix
47. Summary Graph Visualisation of the results of the queries in combination with different fields (Source or Date ) (e.g. queries within weeks)
48.
49. (Chart / Show Clusters Chart / Hide All) Harvester – Show & Hide Cluster
51. Visualization of a “Descriptor” Centrifuge and relation with Partner words Word List Word History Descriptors Words Neighborhood Working Pane Partner Words Result List
52. Descriptors can be used as input query in concert with Tovek’s agent
53. Visualization of a “Descriptor” - IAEA - and the relation with Partner words
54. Visualization of a “Descriptor” - Temelin - and the relation with Partner words