1. > design > publish > search!
How to Search Annotated Text
by Strategy?
Roberto Cornacchia
Wouter Alink
Arjen P. De Vries
Spinque B.V.
CLIN 2013, 18 January 2013
http://www.spinque.com/
2. Search by Strategy
> design > publish > search!
Design the way you would like to search
●
A search engine design framework
●
Custom search engines built from “Strategies”, which:
●
are designed as graphs
●
abstract data processing
●
combine different data sources
●
incorporate probabilistic reasoning
●
translate to database queries
http://www.spinque.com/
3. Search by Strategy
> design > publish > search!
Don't try and program the ultimate search engine
Design a number of domain-specific search strategies
Crime map
Crime map All houses
All houses Query terms
Query terms
Rank Rank Select Rank
Rank Rank Select Rank
on location on location on attribute full-text
on location on location on attribute full-text
Difference
Difference
Click. Generate Web search engines on probabilistic DB
Union
Union
3
10. What's in the DB?
> design > publish > search!
term obj freq subj pred / attr obj / val p
t0 o3 0.03 Roberto speaks_to You 0.95
t0 o5 0.21 You listen_to Roberto 0.6
t1 o2 0.08 speech minutes 15 0.8
Full-text search Annotation search
obj f1 ... fN obj pre size level
o0 0.12 ... 0.84 o0 100 50 0
o1 0.54 ... 0 o1 110 20 1
o2 0.23 ... 0.31 o2 144 16 2
Feature-vectors (CBIR, SVM) Hierarchical search
10
11. Choose hot topics from (kid-)news
> design > publish > search!
http://www.opstel.eu
Kid news Rank on date Expand
Extract terms
11
12. Use POS annotations
> design > publish > search!
Text
<abstract date="2013-01-15">
Lilly de pitbull is een held. De hond uit
de Amerikaanse staat Massachusetts heeft …
</abstract>
Annotated text: we are interested in NPs
<abstract date="2013-01-15">
<NP>Lilly de pitbull</NP> is <NP>een held</NP>.
<NP>De hond uit de Amerikaanse staat
Massachusetts</NP> heeft …
</abstract>
12
16. Topic suggestion for kids
> design > publish > search!
Data: Wikipedia, magazines for children, ..
Left branch: rank data sources on
annotations, e.g.:
Most seen content – hot topics
Seen during night-time? Probably not for kids
Right branch: query expansion using recent
(hot) content
Can we improve this by adding.. ?
Text reading level (machine learning)
Handle spelling mistakes in query expansion
Syntactic dependencies
16
17. Example: syntactic dependencies
> design > publish > search!
AEGIR dependency parser for English (Koster et al.)
Parses text, outputs dependency triples
"PGs prevent the mucosal damage .. "
[PG,SUBJ,prevent]
[prevent,OBJ,damage]
[damage,ATTR,mucosal]
...
CLEFIP 2011: Combining document representations for prior-art
retrieval, Eva D'hondt, Suzan Verberne, Wouter Alink, Roberto
Cornacchia
17
18. > design > publish > search!
Prior art search.
Designed by Eva D'hondt, Nijmegen
18