Streamlining Python Development: A Guide to a Modern Project Setup
Cross language information retrieval (clir)slide
1. Cross Language Information
Retrieval (CLIR)
INFORMATION SEARCHING AND RETRIEVAL (MLS 712)
PREPARED FOR:
ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR
PREPARED BY:
ASYURA BINTI AMINORDIN (2012482362)
MOHD IQBAL AL-FARABI B YAHYA
(2012253658)
DATE: DECEMBER 17, 2012
2. Introduction
Cross-language
information
retrieval
(CLIR) is a subfield of information retrieval dealing
with retrieving information written in a language
different from the language of the user's query. For
example, a user may pose their query in English but
retrieve relevant documents written in French.
http://en.wikipedia.org/wiki/Cross-language_information_retrieval
4. English-Chinese
Information Retrieval System (ECIRS)
Web-based English-Chinese Information Retrieval
System, ECIRS. ECIRS provides a cross-language
platform for helping people to retrieve Chinese
information without inputting a Chinese query. The
web-based client-server architecture allows more
users to access ECIRS through the worldwide
Internet.
5. Conts…
ECIRS consists of a client side and a server side.
The client side is a web-based user interface. The
server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine
and Chinese document collections.
6. Conts…
Client side
Server side
Allows a user to input a query
in English and send the query
to the server side then the
result contains an entry list of
relevant
documents
in
Chinese
An English-Chinese
dictionary and a ChineseEnglish dictionary, are used
to
translate the user's query
from English into Chinese key
word in ECIRS.
7. English - Chinese Information retrieval
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
8. English - Chinese Information retrieval
Side bar from the System
where user can choose any
of the button provided EX:
On-line English
Chinese Dictionary
allow user to translate
English word into
Chinese word
9. English - Chinese Information retrieval
Keyword
:
computer
From the screenshot above we insert any keyword which we
want to search
Example: Computer
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
10. English - Chinese Information retrieval
Translation from English into Chinese
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
11. English Chinese Information retrieval
On-Line Chinese
Information Retrieval
System. The database
where all document or
information that relate to
the information need which
is “Computer”
12. English Chinese Information retrieval
The List of
document
which relate to
the computer.
There was 294
result
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
13. English Chinese Information retrieval
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
14. Big 5 - GB
Big 5 is a Chinese character encoding method used
in Taiwan, Hong Kong, and Macau for Traditional
Chinese characters
GB (Guojia Biaozhun 国家标准 ) is the registered
internet name for a key official character set of the
People's Republic of China, used for simplified
Chinese characters
15. Cross Language Information Retrieval
Layout of the website where people use to book hotel and flight to travel.
23. CINDOR (Conceptual Interlingua Document
Retrieval)
Cross-language text retrieval system capable of accepting
a user's query stated in their native language and then
seamlessly searching, retrieving, relevance ranking and
displaying documents written in a variety of foreign
languages
CINDOR allows users of the system to state queries in
any of the supported languages (currently English,
French, Spanish, and Japanese) and search and retrieve
documents from any of the supported languages.
Adopted ‘Conceptual Interlingua’: unique approach to
cross-language information management based on a
language-independent conceptual representation
24. CINDOR
‘Conceptual’ resource of our conceptual interlingua
Concept of “elasticity: the tendency of a body to
return to its original shape after it has been stretched
or compressed”, which has the label 131186, is
instantiated in English and French
131186 spring, give, springiness
131186 élasticité, flexibilité, moëlleux
25. The Eurovision St Andrews
Photographic Collection
Site presents the collection in a variety of ways: full
text search; or browsing a list of 999 pre-defined
index terms organised alphabetically and
hierarchically via a categories page
SAC consists of 28,133 thumbnail images (around
120x76 pixels), larger versions of these images
(around 368x234 pixels), and associated captions,
giving a total of 84,399 files in the main body of the
collection.
26. Eurovision
Photograph metadata:
(1) a unique record number,
(2) a short title,
(3) a full title,
(4) a textual description of the image content,
(5) the date when the photograph was taken (most frequently with
the day, month and year),
(6) the originator, i.e. the name of an individual or company to which
the photograph is attributed,
(7) the location of the photograph (e.g. the county and the country),
and
(8) a line for notes to offer additional information about the
photograph
27. Eurovision
St Andrews collection has been used for bilingual ad-
hoc retrieval where queries typical to this kind of
historic collection have been generated in English
and translated into languages including a range of
Indo-European, Asian and Romance languages
Challenges include:
Captions which are short in length increasing the likelihood of
vocabulary mismatch, captions with text not directly associated
with the visual content of an image (e.g. expressing something
in the background),
The use of colloquial and domain-specific language in the
caption (i.e. British English).
30. CLIR University of Indonesia
Query expansion techniques: pseudo relevance
feedback
Assumption that the top few documents initially retrieved are
indeed relevant to the query, and so they must contain other
terms that are also relevant to the query
To choose the relevant terms from the top ranked documents,
we used the tf*idf term weighting formula.
We added a certain number of noun terms that have the
highest weight scores.
33. INFOMAP
Chinese question classification is the process that analyzes a
question and labels it based on its question type and expected
answer type
Adopt INFOMAP inference engine to support the knowledge-based
approach for Chinese questions, which can be formulated as
templates and use SVM (Support Vector Machines) as the machine
learning approach for large collections of labeled Chinese questions.
INFOMAP is a knowledge representation framework that extracts
important concepts from a natural language text
Feature of INFOMAP is its capability to represent and match
complicated template structures, such as hierarchical matching,
regular expressions, semantic template matching, frame (non-linear
relations) matching, and graph matching.
Using INFOMAP, we can identify the question category from a
Chinese question.
34. Example
Question
(In which city were the Olympics held in 2004?)
INFOMAP can be formulated as a rule or template
(four elements (denoted as "HAS-PART") in this
rule)
"[5 Time]:[3 Organization]:[7 Q_Location]: ([9
LocationRelatedEvent])“
2004