Comparing Cross-Language Retrieval Tools at CLEF-IP

Chances and Challenges in Comparing
Cross-Language Retrieval Tools

Giovanna Roda
Vienna, Austria

Irf Symposium 2010 / June 3, 2010

CLEF-IP: the Intellectual Property track at CLEF

CLEF-IP is an evaluation track within the Cross Language
Evaluation Forum (Clef). 1

1
http://www.clef-campaign.org



organized by the IRF

1



ﬁrst track ran in 2009

1



ﬁrst track ran in 2009
running this year for the second time

1

What is an evaluation track?

An evaluation track in Information Retrieval is a cooperative action
aimed at comparing diﬀerent techniques on a common retrieval
task.


task.
produces experimental data that can be analyzed and used to
improve existing systems


task.
fosters exchange of ideas and cooperation


task.
produces a reusable test collection, sets milestones


task.
produces a reusable test collection, sets milestones

Test collection
A test collection consists traditionally of target data, a set of
queries, and relevance assessments for each query.

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to ﬁnd prior art for a
given patent.

Clef–Ip 2009: the task

The main task in the Clef–Ip track was to ﬁnd prior art for a
given patent.

Prior art search
Prior art search consists in identifying all information (including
non-patent literature) that might be relevant to a patent’s claim of
novelty.

Participants - 2009 track

1 Tech. Univ. Darmstadt, Dept. of CS,
Ubiquitous Knowledge Processing Lab (DE)


2 Univ. Neuchatel - Computer Science (CH)


3 Santiago de Compostela Univ. - Dept.
Electronica y Computacion (ES)


4 University of Tampere - Info Studies (FI)


5 Interactive Media and Swedish Institute of
Computer Science (SE)


6 Geneva Univ. - Centre Universitaire
d’Informatique (CH)


7 Glasgow Univ. - IR Group Keith (UK)


7 Glasgow Univ. - IR Group Keith (UK)
8 Centrum Wiskunde & Informatica - Interactive
Information Access (NL)


9 Geneva Univ. Hospitals - Service of Medical
Informatics (CH)


Informatics (CH)
10 Humboldt Univ. - Dept. of German Language
and Linguistics (DE)


Informatics (CH)
11 Dublin City Univ. - School of Computing (IE)


Informatics (CH)
12 Radboud Univ. Nijmegen - Centre for Language
Studies & Speech Technologies (NL)


Informatics (CH)
13 Hildesheim Univ. - Information Systems &
Machine Learning Lab (DE)


Informatics (CH)
14 Technical Univ. Valencia - Natural Language
Engineering (ES)


Informatics (CH)
14 Technical Univ. Valencia - Natural Language
Engineering (ES)
15 Al. I. Cuza University of Iasi - Natural Language
Processing (RO)


15 participants


15 participants
48 experiments
submitted for the main
task


15 participants
48 experiments
submitted for the main
task
10 experiments
submitted for the
language tasks

2009-2010: evolution of the CLEF-IP track

2009

1 task: prior art search

targeting granted patents

15 participants

all from academia

families and citations

manual assessments

standard evaluation mea-
sures


2009 2010

1 task: prior art search


15 participants

all from academia


manual assessments

sures


2009 2010

1 task: prior art search prior art candidate search
and classiﬁcation task


15 participants

all from academia


manual assessments

sures


2009 2010


targeting granted patents patent applications

15 participants

all from academia


manual assessments

sures


2009 2010



15 participants 20 participants

all from academia


manual assessments

sures


2009 2010




all from academia 4 industrial participants


manual assessments

sures


2009 2010





families and citations include forward citations

manual assessments

sures


2009 2010






manual assessments expanded lists of relevant
docs

sures


2009 2010






manual assessments expanded lists of relevant
docs

standard evaluation mea- new measure: pres, more
sures recall-oriented

What are relevance assessments

A test collection (also known as gold standard) consists of a target
dataset, a set of queries, and relevance assessments corresponding
to each query.


to each query.

The CLEF-IP test collection:


to each query.


target data: 2 million EP patents


to each query.


queries: full-text patents (without images)


to each query.


queries: full-text patents (without images)
relevance assessments: extended citations

Relevance assessments

We used patents cited as prior art as relevance assessments.



Sources of citations:



1 applicant’s disclosure: the Uspto requires applicants to
disclose all known relevant publications



2 patent oﬃce search report: each patent oﬃce will do a search
for prior art to judge the novelty of a patent



2 patent oﬃce search report: each patent oﬃce will do a search
for prior art to judge the novelty of a patent
3 opposition procedures: patents cited to prove that a granted
patent is not novel

Extended citations as relevance assessments

direct citations and their families


direct citations of family members ...


... and their families

Patent families

A patent family consists of patents granted by diﬀerent patent
authorities but related to the same invention.

Patent families

simple family all family members share the same priority number

Patent families

simple family all family members share the same priority number
extended family there are several deﬁnitions, in the INPADOC
database all documents which are directly or
indirectly linked via a priority number belong to the
same family

Patent families

Patent documents are linked by
priorities

Patent families

INPADOC family.
priorities

Patent families

Clef–Ip uses simple families.
priorities

Relevance assessments 2010

Expanding the 2009 extended citations:


1 include citations of forward citations ...


2 ... and their families



This is apparently a well-known method among patent searchers.



This is apparently a well-known method among patent searchers.
Zig-zag search?

How good are the CLEF-IP relevance assessments?

CLEF-IP uses families + citations:


how complete are extended
citations as a relevance
assessments?


assessments?
will every prior art patent be
included in this set?


assessments?
and if not, what percentage
of prior art items are captured
by extended citations?


assessments?
and if not, what percentage
of prior art items are captured
by extended citations?
when considering forward
citations, how good are
extended citations as a prior
art candidate set?

Feedback from patent experts needed

Quality of prior art candidate sets has to be assessed


Know-how of patent search experts is needed


at Clef–Ip 2009 7 patent search professionals assessed 12
search results


search results
the task was not well deﬁned and there were
misunderstandings on the concept of relevance


search results
the task was not well deﬁned and there were
misunderstandings on the concept of relevance
amount of data was not suﬃcient to draw conclusions

Some initiatives associated with Clef–Ip

The results of evaluation tracks are mostly useful for the research
community.


community.

This community often produces prototypes that are of little
interest to the end-user.


community.

This community often produces prototypes that are of little
interest to the end-user.

Next I’d like to present two concrete outcomes - not of Clef–Ip
directly but arising from work in patent retrieval evaluation

Soire

developed at Matrixware

Soire

service-oriented architecture - available as a a Web service

Soire

allows to replicate IR experiments based on classical
evaluation model

Soire

evaluation model
tested on the CLEF-IP data

Soire

evaluation model
tested on the CLEF-IP data
customized for the evaluation of machine translation

Spinque

a spin-oﬀ (2010) from CWI (the Dutch National Research
Center in Computer Science and Mathematics)

Spinque

introduces search-by-strategy

Spinque

provides optimized strategies for patent search - tested on
CLEF-IP data

Spinque

provides optimized strategies for patent search - tested on
CLEF-IP data
transparency: understand your search results to improve
strategy

Clef–Ip 2009 learnings

The Humboldt University implemented a model for patent search
that produced the best results.



The model combined several strategies:



using metadata (IPC, ECLA)



indexes built at lemma level



an additional phrase index for English



an additional phrase index for English
crosslingual concept index (multilingual terminological
database)

Some additional investigations

Some citations were hard to ﬁnd


% runs class
≤5 hard
5 < x ≤ 10 very difficult
Some citations were hard to find
10 < x ≤ 50 difficult
50 < x ≤ 75 medium
75 < x ≤ 100 easy


We looked at the content of citations and citing patents.


Ongoing investigations.

Comparing Cross-Language Retrieval Tools at CLEF-IP

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (10)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Comparing Cross-Language Retrieval Tools at CLEF-IP

Ähnlich wie Comparing Cross-Language Retrieval Tools at CLEF-IP (20)

Mehr von Giovanna Roda

Mehr von Giovanna Roda (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Comparing Cross-Language Retrieval Tools at CLEF-IP