Towards advanced data retrieval from learning objects repositories
1.
2. TOWARDS ADVANCED DATA RETRIEVAL
FROM LEARNING OBJECTS REPOSITORIES
Valentina Paunovic
Belgrade Metropolitan University
Slobodan Jovanovic
Belgrade Metropolitan University
This work was supported by Ministry of Education, Science and Technology
(Project III44006).
3. What problem do we solve?
Popularity of personalized distance based learning
Demands
Effective creation of learning materials
Enables
REUSABILITY
SEARCH
Enables
5. Our system - contributions
• Search engine
– Steiner-trees approach
– Algorithm for graph representation of LOR.
• Query language
– Extension based on formal logic.
– Algorithm for parsing extended language.
7. Steiner trees approach
• Query
– word1, word2, word3
• Possible interpretation
– Find all objects such that each object contains all
words from query
– Issue: what if there is no such object?
• Alternative interpretation
– Find all groups of related objects such that each
group contains all words form query
9. Ranking
• Smaller number of LO:
– Stronger relationships among terms from query
– Conclusion: advantage in rankings
– Example: the best solutions consist of only one LO
• Group which contains more similar LO (from
same area or subject)
– Stronger relationships among terms from query
– Conclusion: advantage in rankings
– Example: the best solution are groups of LO from the
same area
10. Main advantages
• Situation: there is no object which satisfies all
terms from query
– Traditional search – no results
– Steiner trees search – returns results
• Possible to detect implicit relationships among
learning objects
11. Vector space model from text mining
• How to determine which LO are related?
• LO is represented as an m-dimensional TF-IDF vector:
r (d ) (tfidf1 , tfidf 2 ,..., tfidf m )
• Each component is calculated as tfidf
•
tf * idf
Term frequency:
tfi
h j n(i, j )
j
– n(i,j) - number of occurrences of i-th term in the j-th slot of LO d
– hj - weight associated with the j-th slot.
12. Vector space model II
• Weights :
– The highest impact (weight) have terms from metadata
title, keywords and description.
– Medium impact have terms from content (if there is
textual content).
– Low impact have terms from the rest of searchable
metadata
• Inverse document frequency has purpose to reduce
impact of common words
| LOR |
idfi log
| {d LOR : wi d } |
13. LO similarity measure
• Now we can introduce similarity measure
• One possibility - Cosine similarity
sim(d1, d 2)
r (d1) r (d 2)
|| r (d1) || * || r (d1) ||
14. Search algorithm
• Issue: finding top k minimum cost Steiner
trees (MCST-k) is NP complete
• DBPF-k developed for keyword search on DB:
– Has polynomial solution
– First returned result is optimal
– The rest of (k-1) solutions are approximate
• Efficiency of DBPF-k algorithm depends on
graph sparseness.
15. Graph representation of LOR
• Steiner-trees search requires sparse graph
• Graph representation of LOR:
– Nodes: LO
– Weighted edges: defined by similarity measure
between any two nodes
• Issue: dense graph - number of edges:
2
O(( number of LO ) )
• Result: Slow search
16. Graph sparsification - rules
•
•
•
•
No node should be removed from the graph.
Low similarity edges should be removed from the graph.
Edge removal should not violate graph connectivity.
Targeted number of edges is specified by parameter T.
Graph obtained by sparsification process should have less
than T edges, unless it violates connectivity constraint.
• No priority among edges of equal weight
• If two learning objects are in relationship specified by the
metadata relation, it should be preserved in the graph
regardless of similarity degree between these two learning
objects.
18. Query language
• Example query: exponential function
• Issue 1: What if there is a term exp instead of
exponential?
– Possible solution: dictionary of synonyms + dictionary of
acronyms and abbreviations
– Problem: Can be complicated to implement
• Issue 2: Find all exponential or logarithmic
functions
– Possible solution: submit two different queries
– Problem: Can be inconvenient for a user
19. Query language - extension
1. Operator and, marked by reserved word %AND.
2. Operator or, marked by reserved word %OR.
• Both operators have the same precedence priority.
• Expressions are evaluated from left to right.
• If there is no operator between two terms, implicitly
is assumed %AND operation. For example, “math
function” is evaluated as “math %AND function”.
• Associativity rule is preserved from formal logic
20. Query language
• How to evaluate complex expression like
(a %OR b) %AND ((c %OR d) %AND e)
• We can not submit such query directly to search
algorithm
• We need a query parsing algorithm
21. ,
,
Query language - terminology
.
• Term (t) – word used in a query
• Simple Query (Q) – set of terms:
Q {t1 , t 2 ,..., t|Q| }
• Expression (E) – set of simple queries:
E {Q1 , Q2 ,..., Q|E| }
• Operation corresponds to operator %AND:
E1
E2
{Qi Q j | Qi
E1 , Q j
• Operation corresponds to operator %OR:
E1
E2
E1 E 2
E2 }
22. Parsing algorithm
initialize S as empty stack of expressions;
initialize empty set of search results R;
foreach token w of query
switch(w):
case “(”,“%AND”,“%OR”: push w to S;
case “)”:
E<-evaluateTopExpression(S);
push E to S;
default:
if(previous token is term)
push “%AND” to S;
Q = {w};
E = {Q};
push E to S;
end switch;
E<-evaluateTopExpression(S);
foreach simple query Q from E
result = DBPF-k(Q);
add result to R;
evaluateTopExpression(S)
{
initialize SH as empty stack;
while (S not empty)
wh<-pop from S;
if(wh = “(”)
break;
push wh to SH;
while (true)
first<-pop from SH;
if (SH is empty) return first;
operator<-pop from SH;
second<-pop from SH;
switch(operator)
case “%AND”:
result = first ^ second;
case “%OR”:
result = first v second;
end switch;
push result to SH;
}
24. Conclusion
• Proposed architectural solution for advanced
search through repositories of learning objects
• Search based on finding top-k min-cost Steiner
trees
• Proposed algorithm for sparse weighted graph
representation of a LO repository
• Proposed extension of query language based
on formal logic and designed an algorithm for
parsing it