SlideShare a Scribd company logo
1 of 38
Cross Language Information
Retrieval (CLIR)
INFORMATION SEARCHING AND RETRIEVAL (MLS 712)

PREPARED FOR:
ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR
PREPARED BY:
ASYURA BINTI AMINORDIN (2012482362)
MOHD IQBAL AL-FARABI B YAHYA
(2012253658)
DATE: DECEMBER 17, 2012
Introduction
Cross-language

information
retrieval
(CLIR) is a subfield of information retrieval dealing
with retrieving information written in a language
different from the language of the user's query. For
example, a user may pose their query in English but
retrieve relevant documents written in French.

http://en.wikipedia.org/wiki/Cross-language_information_retrieval
CLIR Purpose
Researchers

in
Cross-Language
Information
Retrieval (CLIR) seek to support the process of
finding documents written in one natural language
with automated systems that can accept queries
expressed in other languages.
English-Chinese
Information Retrieval System (ECIRS)
Web-based English-Chinese Information Retrieval

System, ECIRS. ECIRS provides a cross-language
platform for helping people to retrieve Chinese
information without inputting a Chinese query. The
web-based client-server architecture allows more
users to access ECIRS through the worldwide
Internet.
Conts…
ECIRS consists of a client side and a server side.

The client side is a web-based user interface. The
server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine
and Chinese document collections.
Conts…
Client side

Server side

Allows a user to input a query
in English and send the query
to the server side then the
result contains an entry list of
relevant
documents
in
Chinese

An English-Chinese
dictionary and a ChineseEnglish dictionary, are used
to
translate the user's query
from English into Chinese key
word in ECIRS.
English - Chinese Information retrieval

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English - Chinese Information retrieval

Side bar from the System
where user can choose any
of the button provided EX:
On-line English
Chinese Dictionary
allow user to translate
English word into
Chinese word
English - Chinese Information retrieval

Keyword
:
computer

From the screenshot above we insert any keyword which we
want to search
Example: Computer
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English - Chinese Information retrieval

Translation from English into Chinese

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English Chinese Information retrieval

On-Line Chinese
Information Retrieval
System. The database
where all document or
information that relate to
the information need which
is “Computer”
English Chinese Information retrieval

The List of
document
which relate to
the computer.
There was 294
result
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English Chinese Information retrieval

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
Big 5 - GB
Big 5 is a Chinese character encoding method used

in Taiwan, Hong Kong, and Macau for Traditional
Chinese characters
GB (Guojia Biaozhun 国家标准 ) is the registered
internet name for a key official character set of the
People's Republic of China, used for simplified
Chinese characters
Cross Language Information Retrieval

Layout of the website where people use to book hotel and flight to travel.
Conts…

Users can choose
any language.
Example: Japanese
Conts…

Change into
Japanese
wording.

As we can see the language in the layout change into Japanese wording.
Conts…

By using Google translate it allow users to identified the
meaning of the Japanese word.
EXAMPLE: MALAY-to-JAPANESE
Conts…

Insert the translation word from the Google translate in search
engine of www.easytobook.com
Conts…

Click
any
result

A list of result where 131 hotels is available where we can
see the wording show is still in Japanese.
Conts…

The description of the hotel in Kuala Lumpur is written
in Japanese.
CLIR WEBSITE EXAMPLE
http://www.cs.nmsu.edu/~sliu/main_frame.html
http://www.easytobook.com/
CINDOR (Conceptual Interlingua Document
Retrieval)
Cross-language text retrieval system capable of accepting

a user's query stated in their native language and then
seamlessly searching, retrieving, relevance ranking and
displaying documents written in a variety of foreign
languages
CINDOR allows users of the system to state queries in
any of the supported languages (currently English,
French, Spanish, and Japanese) and search and retrieve
documents from any of the supported languages.
Adopted ‘Conceptual Interlingua’: unique approach to
cross-language information management based on a
language-independent conceptual representation
CINDOR
‘Conceptual’ resource of our conceptual interlingua
Concept of “elasticity: the tendency of a body to

return to its original shape after it has been stretched
or compressed”, which has the label 131186, is
instantiated in English and French



131186 spring, give, springiness
131186 élasticité, flexibilité, moëlleux
The Eurovision St Andrews
Photographic Collection
Site presents the collection in a variety of ways: full

text search; or browsing a list of 999 pre-defined
index terms organised alphabetically and
hierarchically via a categories page
SAC consists of 28,133 thumbnail images (around
120x76 pixels), larger versions of these images
(around 368x234 pixels), and associated captions,
giving a total of 84,399 files in the main body of the
collection.
Eurovision
Photograph metadata:









(1) a unique record number,
(2) a short title,
(3) a full title,
(4) a textual description of the image content,
(5) the date when the photograph was taken (most frequently with
the day, month and year),
(6) the originator, i.e. the name of an individual or company to which
the photograph is attributed,
(7) the location of the photograph (e.g. the county and the country),
and
(8) a line for notes to offer additional information about the
photograph
Eurovision
St Andrews collection has been used for bilingual ad-

hoc retrieval where queries typical to this kind of
historic collection have been generated in English
and translated into languages including a range of
Indo-European, Asian and Romance languages
Challenges include:




Captions which are short in length increasing the likelihood of
vocabulary mismatch, captions with text not directly associated
with the visual content of an image (e.g. expressing something
in the background),
The use of colloquial and domain-specific language in the
caption (i.e. British English).
The web interface to the St
Andrews collection
The web interface to the St Andrews
collection
CLIR University of Indonesia
Query expansion techniques: pseudo relevance

feedback






Assumption that the top few documents initially retrieved are
indeed relevant to the query, and so they must contain other
terms that are also relevant to the query
To choose the relevant terms from the top ranked documents,
we used the tf*idf term weighting formula.
We added a certain number of noun terms that have the
highest weight scores.
Interface and program demo
Interface and program demo
INFOMAP
 Chinese question classification is the process that analyzes a

question and labels it based on its question type and expected
answer type
 Adopt INFOMAP inference engine to support the knowledge-based
approach for Chinese questions, which can be formulated as
templates and use SVM (Support Vector Machines) as the machine
learning approach for large collections of labeled Chinese questions.
 INFOMAP is a knowledge representation framework that extracts
important concepts from a natural language text
 Feature of INFOMAP is its capability to represent and match
complicated template structures, such as hierarchical matching,
regular expressions, semantic template matching, frame (non-linear
relations) matching, and graph matching.
 Using INFOMAP, we can identify the question category from a
Chinese question.
Example
Question

(In which city were the Olympics held in 2004?)
INFOMAP can be formulated as a rule or template

(four elements (denoted as "HAS-PART") in this
rule)




"[5 Time]:[3 Organization]:[7 Q_Location]: ([9
LocationRelatedEvent])“
2004
Searching Demo
Searching demo
Searching demo
Thank You

More Related Content

What's hot

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Information Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyInformation Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyBhojaraju Gunjal
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalVikas Bhushan
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsVaibhav Khanna
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval systemLeslie Vargas
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 
Digital library technologies
Digital library technologies Digital library technologies
Digital library technologies Shriram Pandey
 
Presentation google scholar
Presentation google scholarPresentation google scholar
Presentation google scholarmaryamfarooqi
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Jeet Das
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irVaibhav Khanna
 
Topic detection & tracking
Topic detection & trackingTopic detection & tracking
Topic detection & trackingGeorge Ang
 

What's hot (20)

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Information Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyInformation Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case Study
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
 
Signature files
Signature filesSignature files
Signature files
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Text summarization
Text summarizationText summarization
Text summarization
 
Digital library technologies
Digital library technologies Digital library technologies
Digital library technologies
 
Presentation google scholar
Presentation google scholarPresentation google scholar
Presentation google scholar
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)
 
Dspace software
Dspace softwareDspace software
Dspace software
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of ir
 
Topic detection & tracking
Topic detection & trackingTopic detection & tracking
Topic detection & tracking
 

Similar to Cross language information retrieval (clir)slide

Arabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase ExtractionArabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase Extractioncscpconf
 
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONcsandit
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urduijaia
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7alaa223
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALIJCI JOURNAL
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsRichard Littauer
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMcscpconf
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir systemcsandit
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondULB - Bibliothèques
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...faflrt
 

Similar to Cross language information retrieval (clir)slide (20)

07 04-06
07 04-0607 04-06
07 04-06
 
Arabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase ExtractionArabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase Extraction
 
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urdu
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
CASL Report1
CASL Report1CASL Report1
CASL Report1
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the Pond
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Cross language information retrieval (clir)slide

  • 1. Cross Language Information Retrieval (CLIR) INFORMATION SEARCHING AND RETRIEVAL (MLS 712) PREPARED FOR: ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR PREPARED BY: ASYURA BINTI AMINORDIN (2012482362) MOHD IQBAL AL-FARABI B YAHYA (2012253658) DATE: DECEMBER 17, 2012
  • 2. Introduction Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. For example, a user may pose their query in English but retrieve relevant documents written in French. http://en.wikipedia.org/wiki/Cross-language_information_retrieval
  • 3. CLIR Purpose Researchers in Cross-Language Information Retrieval (CLIR) seek to support the process of finding documents written in one natural language with automated systems that can accept queries expressed in other languages.
  • 4. English-Chinese Information Retrieval System (ECIRS) Web-based English-Chinese Information Retrieval System, ECIRS. ECIRS provides a cross-language platform for helping people to retrieve Chinese information without inputting a Chinese query. The web-based client-server architecture allows more users to access ECIRS through the worldwide Internet.
  • 5. Conts… ECIRS consists of a client side and a server side. The client side is a web-based user interface. The server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine and Chinese document collections.
  • 6. Conts… Client side Server side Allows a user to input a query in English and send the query to the server side then the result contains an entry list of relevant documents in Chinese An English-Chinese dictionary and a ChineseEnglish dictionary, are used to translate the user's query from English into Chinese key word in ECIRS.
  • 7. English - Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 8. English - Chinese Information retrieval Side bar from the System where user can choose any of the button provided EX: On-line English Chinese Dictionary allow user to translate English word into Chinese word
  • 9. English - Chinese Information retrieval Keyword : computer From the screenshot above we insert any keyword which we want to search Example: Computer Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 10. English - Chinese Information retrieval Translation from English into Chinese Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 11. English Chinese Information retrieval On-Line Chinese Information Retrieval System. The database where all document or information that relate to the information need which is “Computer”
  • 12. English Chinese Information retrieval The List of document which relate to the computer. There was 294 result Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 13. English Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 14. Big 5 - GB Big 5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters GB (Guojia Biaozhun 国家标准 ) is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters
  • 15. Cross Language Information Retrieval Layout of the website where people use to book hotel and flight to travel.
  • 16. Conts… Users can choose any language. Example: Japanese
  • 17. Conts… Change into Japanese wording. As we can see the language in the layout change into Japanese wording.
  • 18. Conts… By using Google translate it allow users to identified the meaning of the Japanese word. EXAMPLE: MALAY-to-JAPANESE
  • 19. Conts… Insert the translation word from the Google translate in search engine of www.easytobook.com
  • 20. Conts… Click any result A list of result where 131 hotels is available where we can see the wording show is still in Japanese.
  • 21. Conts… The description of the hotel in Kuala Lumpur is written in Japanese.
  • 23. CINDOR (Conceptual Interlingua Document Retrieval) Cross-language text retrieval system capable of accepting a user's query stated in their native language and then seamlessly searching, retrieving, relevance ranking and displaying documents written in a variety of foreign languages CINDOR allows users of the system to state queries in any of the supported languages (currently English, French, Spanish, and Japanese) and search and retrieve documents from any of the supported languages. Adopted ‘Conceptual Interlingua’: unique approach to cross-language information management based on a language-independent conceptual representation
  • 24. CINDOR ‘Conceptual’ resource of our conceptual interlingua Concept of “elasticity: the tendency of a body to return to its original shape after it has been stretched or compressed”, which has the label 131186, is instantiated in English and French   131186 spring, give, springiness 131186 élasticité, flexibilité, moëlleux
  • 25. The Eurovision St Andrews Photographic Collection Site presents the collection in a variety of ways: full text search; or browsing a list of 999 pre-defined index terms organised alphabetically and hierarchically via a categories page SAC consists of 28,133 thumbnail images (around 120x76 pixels), larger versions of these images (around 368x234 pixels), and associated captions, giving a total of 84,399 files in the main body of the collection.
  • 26. Eurovision Photograph metadata:         (1) a unique record number, (2) a short title, (3) a full title, (4) a textual description of the image content, (5) the date when the photograph was taken (most frequently with the day, month and year), (6) the originator, i.e. the name of an individual or company to which the photograph is attributed, (7) the location of the photograph (e.g. the county and the country), and (8) a line for notes to offer additional information about the photograph
  • 27. Eurovision St Andrews collection has been used for bilingual ad- hoc retrieval where queries typical to this kind of historic collection have been generated in English and translated into languages including a range of Indo-European, Asian and Romance languages Challenges include:   Captions which are short in length increasing the likelihood of vocabulary mismatch, captions with text not directly associated with the visual content of an image (e.g. expressing something in the background), The use of colloquial and domain-specific language in the caption (i.e. British English).
  • 28. The web interface to the St Andrews collection
  • 29. The web interface to the St Andrews collection
  • 30. CLIR University of Indonesia Query expansion techniques: pseudo relevance feedback    Assumption that the top few documents initially retrieved are indeed relevant to the query, and so they must contain other terms that are also relevant to the query To choose the relevant terms from the top ranked documents, we used the tf*idf term weighting formula. We added a certain number of noun terms that have the highest weight scores.
  • 33. INFOMAP  Chinese question classification is the process that analyzes a question and labels it based on its question type and expected answer type  Adopt INFOMAP inference engine to support the knowledge-based approach for Chinese questions, which can be formulated as templates and use SVM (Support Vector Machines) as the machine learning approach for large collections of labeled Chinese questions.  INFOMAP is a knowledge representation framework that extracts important concepts from a natural language text  Feature of INFOMAP is its capability to represent and match complicated template structures, such as hierarchical matching, regular expressions, semantic template matching, frame (non-linear relations) matching, and graph matching.  Using INFOMAP, we can identify the question category from a Chinese question.
  • 34. Example Question  (In which city were the Olympics held in 2004?) INFOMAP can be formulated as a rule or template (four elements (denoted as "HAS-PART") in this rule)   "[5 Time]:[3 Organization]:[7 Q_Location]: ([9 LocationRelatedEvent])“ 2004