9. ESA Based Data Retrieval - Example
salvaging shipwreck treasure
“ANCIENT ARTIFACTS FOUND.
Divers have recovered artifacts
lying underwater for more than
2,000 years in the wreck of a
Roman ship that sank in the Gulf
of Baratti, 12 miles off the island
of Elba, newspapers reported
Saturday."
•SHIPWRECK
•TREASURE
•MARITIME ARCHAEOLOGY
•MARINE SALVAGE
•HISTORY OF THE BRITISH VIRGIN ISLANDS
•WRECKING (SHIPWRECK)
•KEY WEST, FLORIDA
•FLOTSAM AND JETSAM
•WRECK DIVING
•SPANISH TREASURE FLEET
•SCUBA DIVING
•WRECK DIVING
•RMS TITANIC
•USS HOEL (DD-533)
•SHIPWRECK
•UNDERWATER ARCHAEOLOGY
•USS MAINE (ACR-1)
•MARITIME ARCHAEOLOGY
•TOMB RAIDER II
•USS MEADE (DD-602)
10. Irrelevant Docs
• ESTONIA AT THE 2000 SUMMER OLYMPICS
• ESTONIA AT THE 2004 SUMMER OLYMPICS
• 2006 COMMONWEALTH GAMES
• ESTONIA AT THE 2006 WINTER OLYMPICS
• 1992 SUMMER OLYMPICS
• ATHLETICS AT THE 2004 SUMMER OLYMPICS
• 2000 SUMMER OLYMPICS
• 2006 WINTER OLYMPICS
• CROSS-COUNTRY SKIING 2006 WINTER OLYMPICS
• NEW ZEALAND AT THE 2006 WINTER OLYMPICS
“Olympic News In Brief: Cycling
win for Estonia.
Erika Salumae won Estonia's first
Olympic gold when retaining the
women's cycling individual sprint
title she won four years ago in
Seoul as a Soviet athlete. "
Estonia Economy
• ESTONIA
• ECONOMY OF ESTONIA
• ESTONIA AT THE 2000 SUMMER OLYMPICS
• ESTONIA AT THE 2004 SUMMER OLYMPICS
• ESTONIA NATIONAL FOOTBALL TEAM
• ESTONIA AT THE 2006 WINTER OLYMPICS
• BALTIC SEA ??
• EUROZONE
• TIIT VÄHI
• MILITARY OF ESTONIA??
11. Selecting Query Features
• Selection could remove noisy ESA concepts
• However, IR task provides no training data…
Utility function U(+|-)
requires target measure
>> training set
f
=ESA(q)
Filter
U
f’
Focus on query concepts -
Query is short and noisy, while
FS at indexing lacks context
13. ESA Feature Selection Methods
• IG- calculate each feature’s Information Gain
in separating positive and negative
examples, take best performing features
• IIG- add concepts in the positive examples to
candidate features, and re-weight all features
based on their weights in examples
• RV- find subset of features that best separates
positive and negative examples, employing
heuristic search
16. Conclusion
• MORAG: a new methodology for concept-
based information retrieval
• Documents and query are enhanced by
Wikipedia concepts
• Informative features are selected using
pseudo-relevance feedback
• The generated features improve the
performance of BOW-based systems