5. Vocabularies
ws .nju.edu.cn
Scale
Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 25
6. Vocabulary snippets --- state of the art
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 25
7. Vocabulary snippets --- our approach
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 25
8. Vocabulary summarization
ws .nju.edu.cn
Vocabulary summarization = ranking and selecting RDF sentences
Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 25
9. Outline
ws .nju.edu.cn
Introduction
Salience measurement
Vocabulary summarization
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 25
10. A bipartite view of vocabulary description
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 25
11. Surfer behavior --- type A
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 25
12. Surfer behavior --- type B
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 25
13. BipRank
ws .nju.edu.cn
Next step ? Uniform Current step
type-A behavior
type-B behavior
Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 25
14. Pattern of RDF sentence
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 25
15. p(s|u)
ws .nju.edu.cn
Frequency of Pattern(s)
#RDF_sentence in the vocabulary that has the same pattern
Popularity of Pattern(s)
#Vocabulary in the repository that has the same pattern
Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 25
16. Evaluation setting
ws .nju.edu.cn
Test cases
9 moderate-sized vocabularies randomly selected from Falcons
Gold standard
Salience given by 6 human experts
Competitors
Cp: Zhang et al. (WWW2007)
Our approach
BipRank-U: pattern-unaware
BipRank-F: using pattern frequency
BipRank-P: using pattern popularity
Metric
Pearson product-moment correlation coefficient
Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 25
17. Evaluation results
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 25
18. Outline
ws .nju.edu.cn
Introduction
Salience measurement
Vocabulary summarization
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 25
19. Goodness of a summary
ws .nju.edu.cn
Salience
Query relevance
Textual similarity between query and summary
Cohesion
Term overlap between RDF sentences
Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 25
20. Looking for the best summary
ws .nju.edu.cn
Multi-objective optimization
Single aggregate objective function
Solution: a greedy strategy
Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 25
21. Evaluation setting
ws .nju.edu.cn
Judges
18 human experts
Test cases
190 searches over 2,012 vocabularies crawled by Falcons
Competitors
Generic: Zhang et al. (WWW2007)
Our approach
QR: query relevance
QR+S: query relevance + salience
QR+C: query relevance + cohesion
Metric
Rating on a 10-point scale
Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 25
22. Evaluation results
ws .nju.edu.cn
Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 25
23. Performance testing
ws .nju.edu.cn
Size of summary Runtime
Size of vocabulary
Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 25
24. Outline
ws .nju.edu.cn
Introduction
Salience measurement
Vocabulary summarization
Conclusions
Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 25
25. Conclusions
ws .nju.edu.cn
Salience measurement
Sentence-term graph
BipRank
Pattern of RDF sentence
Vocabulary summarization
Salience
Query relevance
Cohesion
Implemented in Falcons Ontology Search
http://ws.nju.edu.cn/falcons/ontologysearch/
Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 25