Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Learning Better Context Characterizations: an Intelligent Information Retrieval Approach Carlos M Lorenzetti Ana G Maguitman [email_address] [email_address] Universidad Nacional del Sur Av. L.N. Alem 1253 Bahía Blanca - Argentina Grupo de Investigación en Recuperación de Información y Gestión del Conocimiento Laboratorio de Investigación y Desarrollo en Inteligencia Artificial CONICET AGENCIA

Information Retrieval limitations

Information Retrieval limitations Java as an island

Information Retrieval limitations Java as programming language

Problems: ambiguity Java? Animals Computers Consumables Entertainment Geography Flora Ships

Proposed solutions ,[object Object],[object Object],[object Object]

Context Characterization Words list T1 p1 T2 p2 T3 p3 T4 p4 Tn pn Context Articles Newspapers Others

Context Characterization ,[object Object],[object Object],[object Object],Counts documents’ term ocurrence Penalizes very common terms

Different Role of Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object]

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine Good descriptors

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine Good discriminators

Documents Descriptors and Discriminators Number of occurrences of term j in document i Topic: Java Virtual Machine Initial Context H ,[object Object],[object Object],[object Object],[object Object],(1) (2) (3) (4) 0 3 3 0 0 1 2 0 1 0 0 4 2 0 0 4 3 0 0 3 0 2 2 0 1 1 2 0 0 1 1 0 0 2 3 6 2 5 5 2 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java

Documents Descriptors Topic: Java Virtual Machine Initial Context Descriptive power of a term in a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,000 0,000 0,000 0,000 0,000 0,539 0,180 0,180 0,359 0,718

Documents Discriminators Topic: Java Virtual Machine Initial Context Discriminating power of a term in a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,000 0,000 0,000 0,000 0,000 0,577 0,500 0,577 0,500 0,447

Documents comparison criteria Documents similarity K 1 K 3 K 2 d 2 d 1  Cosine similarity

Topics Descriptors Topic: Java Virtual Machine Initial Context Term descriptive power in a topic of a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,014 0,032 0,040 0,040 0,055 0,064 0,089 0,124 0,158 0,385

Topics Discriminators Topic: Java Virtual Machine Initial Context Term discriminating power in a topic of a document 0 province 0 island 0 coffee 4 java 1 language 2 machine 3 programming 1 virtual 0 jdk 0 jvm 0,385 0,385 0,385 0,493 0,517 0,524 0,566 0,566 0,848 0,848

Proposed Algorithm Context w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w m-1 w m w m-2 w 9 . . . Roulette query 01 query 02 query 03 query n result 03 result 01 result 02 result n w 0,5 w 0,25 . . . w 0,1 1 2 m DESCRIPTORS DESCRIPTORS w 0,4 w 0,37 . . . w 0,01 1 2 m DISCRIMINATORS DISCRIMINATORS 1 2 4 3 Terms

Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 st level 2 nd level 3 rd level Top Home Science Arts Cooking Family Childcare

Evaluation – Similarity Top/Computers/Open_Source/Software 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 iteration novelty-driven similarity [0.5866; 0.6073] 0.5970 best [0.0618; 0.0704] 0.0661 1 st 95% CI Mean  N Maximum Average Minimum

Evaluation – Similarity Context update 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 iteration novelty-driven similarity [0.5866; 0.6073] 0.5970 best [0.0618; 0.0704] 0.0661 1 st 95% CI Mean  N Maximum Average Minimum

Evaluation – Similarity Query formulation and retrieval process 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 iteration novelty-driven similarity [0.5866; 0.6073] 0.5970 best [0.0618; 0.0704] 0.0661 1 st 95% CI Mean  N Maximum Average Minimum

Evaluation – Precision 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 first iteration precision best iteration precision Improvement observed (89.18%) No-improvement observed

Evaluation – Recall 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 first iteration recall 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 best iteration recall Improvement observed (89.38%) No-improvement observed

Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Thank you! CONICET AGENCIA Laboratorio de Investigación y Desarrollo en Inteligencia Artificial lidia.cs.uns.edu.ar Universidad Nacional del Sur Bahía Blanca www.uns.edu.ar

Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Ähnlich wie Learning Better Context Characterizations: An Intelligent Information Retrieval Approach (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Hinweis der Redaktion