SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Information Retrieval
(from EBMT examples)
Michael P. Oakes
University of Wolverhampton
Birmingham Winter School, 2013
Finding Out About (FOA)
 “Finding Out About”, by Richard Belew (2000),
Cambridge University Press.
 Finding Out About (FOA) - research activities that
allow a decision maker to draw on others’ knowledge,
especially the WWW = “Information Retrieval”.
 A library (or WWW) contains many books (“documents”)
on many topics. The authors are typically people far
from our time or place, using language similar but not
identical to our own. We must FOA a topic of special
interest, looking only for those things which are
relevant to our search. This basic skill is a fundamental
part of an academic’s job:
FOA has 3 phases:

 asking a question
 constructing an answer
 assessing an answer
1. Asking a question
 Users of a search engine may be aware of a specific
gap in their knowledge, and are motivated to fill it
(meta-cognition about ignorance). They may not be able
to articulate their knowledge gap. Forming a clearly
posed question is the hardest part of answering it! This
common cognitive state is the user’s information need.
User’s try to take their ill-defined, internal cognitive state
and turn it into an external expression of their question.
This external expression is called the query, and the
language in which it is constructed the query
language.
2. Constructing an answer
 A human question-answerer might consider:
 can they translate the user’s ill-formed query into a
better one?
 do they know the answer themselves?
 are they able to verbalise this answer in terms the user
will understand?
 can they provide the necessary background knowledge
for the user to understand the answer itself?
 Current search engines are slightly more limited in
scope. The search engine has available to it only a preexisting set of “canned” texts (although this may be very
large), and its response is limited to identifying one or
more of these passages and presenting them to the
users.
3. Assessing the answer
 Assessing the answer
 We would normally give feedback to a human answerer,
e.g. “That isn’t what I meant”, “Let me ask it another
way”, “That helps, but I still have this problem” or “What
does that have to do with anything?”. So we “close the
loop” when the user provides an assessment of how
relevant the found the answer provided. In an
automatic system this is relevance feedback - the user
reacts to each retrieved document as “relevant”, “not
relevant” or “neutral”.
 See fig 1.4. The three steps in a computerised,
algorithmic context, information retrieval.
Keywords
 The fundamental operation performed by a search
engine is a match between descriptive features
mentioned by users in their queries, and documents
sharing those features. By far the most important kind
of features are keywords.
 Keywords are linguistic atoms - typically words, pieces
of words, or phrases - used to characterise the subject
or content of the document.
 They are pivotal because they must bridge the gap
between the users’ characterisation of their information
need (queries) and the characterisation of the
documents’ topical focus against which these will be
matched.
 Contrast natural language queries with bag-of-words
queries.
Keywords as document descriptors

 Keywords are also used as document
descriptors.
 Indexing is the process of associating one or
more keywords with each document.
 The vocabulary used can either be controlled
or uncontrolled (also known as closed or
open). If we organise a conference, and ask
the authors of each paper to index it manually
using only terms on a fixed list of potential
keywords, this is a closed vocabulary.
Query syntax
 Query syntax. A typical search engine
query consists of 2 to 3 words.
 Queries defined only as sets of keywords
are simple queries - most search engines
use this “bag of words” approach.
 Other possibilities exist e.g. Boolean
operators and / or / not e.g. “neural
networks AND speech recognition”.
 Verb(subject, object) triples e.g. “aspirin
treats blood_clotting”
Document length
 Document length. Longer documents can
discuss more topics, and hence be associated
with more keywords, and thus are more likely
to be retrieved.
 This means we must normalise documents’
indices in some way to compensate for
differing lengths.
 We also assume that the smallest unit of text
with appreciable “aboutness” is the paragraph,
and larger documents are constructed of a
number of paragraphs.
Stemming
 Stemming aims to remove surface markings
(such as number) to reveal a root form
 Using a token’s root form as an index term can
give robust retrieval even when the query
contains the plural CARS while the document
contains the singular CAR
 Linguists distinguish inflectional morphology
(plurals, third person singular, past tense, -ing)
from derivational morphology (e.g. teach
(verb), teacher (noun)). Weak vs. strong
stemming.
Example stemming rules
 (.*)SSES  /1SS: PERL-like syntax to say that
strings ending in –SSES should be transformed
by taking the stem (characters before –SSES)
and adding only the two characters –SS.
 (.*)IES  /1Y
 A complete stemmer contains many such rules
(60 in Lovins’ set), and a regime for handling
conflicts when multiple rules match the same
token, e.g. longest match, rule order.
Pros and Cons of Stemming
 Reduces the size of the keyword vocabulary, allowing
compression of the index files of 10 – 50%.
 Increases recall – a query on FOOTBALL now also
finds documents on FOOTBALLER(S), FOOTBALLING.
 Reduces precision – stripping away morphological
features may obscure differences in word meanings.
For example, GRAVITY has two senses (earth’s pull,
seriousness). GRAVITATION can only refer to earth’s
pull – but if we stem it to GRAVITY, it could mean either.
Calculating TF-IDF weighting

The Vector Space Model
The cosine similarity measure


How well are we doing (1)?
 Evaluation of search engines is notoriously
difficult. However, we have two measures of
objective assessment. The first step is to
focus on a particular query.
 We identify the set of documents Rel that are
determined to be relevant to it (subjective!).
 A perfect search engine would retrieve all and
only the documents in Rel.
 See fig. 1.10
Recall
 Clearly, the number of documents that were designated
both relevant and retrieved, Retr ∩ Rel will be a key
measure of success.
 But we must compare the size of the set
|Retr ∩ Rel| to something.
 If we were very concerned that the search engine
retrieve every relevant document (e.g. every prior ruling
relevant to a judicial case) , we should compare the
intersection to the number of documents marked as
relevant |Rel|.
 This measure is known as recall = |Retr ∩ Rel| / |Rel| :
Precision

 However, we might instead be worried about how much of
what we see is relevant (search engine users want a lot of
relevant hits on the first page), so an equally reasonable
standard of comparison is precision, the proportion of
retrieved documents which are in fact relevant:
 P = |Retr ∩ Rel|  |Retr|
Retrieved versus Relevant Docs
High Recall Retrieval

Retrieved

Relevant

High Precision Retrieval
Conclusion: IR for short documents like
translation examples

 Prior annotation of the corpus of past translations (Clifton and
Teahan, 2004, QA systems)
 Similarity measures for short segments of text: stemming to
increase recall, then document expansion, Kullback-Leibler
Divergence as a similarity measure (Metzler et al., 2007).
 Tao Tao et al (2006). Need for short text matching:
query/image caption, sponsored search: query/ad keyword;
query reformulation: query / query similarity
 Document expansion. Rohit Gupta expands the documents
with all entries in the PPDB Paraphrase database: lexical,
phrasal and syntactic.
 Relevance feedback? (“more like this”).

Weitere ähnliche Inhalte

Was ist angesagt?

Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSijcsit
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...cseij
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information RetrievalHarsh Thakkar
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityLawrie Hunter
 

Was ist angesagt? (20)

Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Aq35241246
Aq35241246Aq35241246
Aq35241246
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
 
POPSI
POPSIPOPSI
POPSI
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
NLP todo
NLP todoNLP todo
NLP todo
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object Comprehensibility
 

Andere mochten auch

18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) TerminologyRIILP
 
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for TranslationRIILP
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT IntroductionRIILP
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
 
1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner IntroductionsRIILP
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...RIILP
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 

Andere mochten auch (15)

18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology
 
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)9. Ethics - Juan Jose Arevalillo Doval (Hermes)
9. Ethics - Juan Jose Arevalillo Doval (Hermes)
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
 
1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 

Ähnlich wie 14. Michael Oakes (UoW) Natural Language Processing for Translation

Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrievalmghgk
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.pptBereketAraya
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
 
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperJohn Felahi
 
Query formulation process
Query formulation processQuery formulation process
Query formulation processmalathimurugan
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperDBOnto
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrievalcaptainmactavish1996
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersMOHDSAIFWAJID1
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
 

Ähnlich wie 14. Michael Oakes (UoW) Natural Language Processing for Translation (20)

Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt6&7-Query Languages & Operations.ppt
6&7-Query Languages & Operations.ppt
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
 
Lec 2
Lec 2Lec 2
Lec 2
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators Paper
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrieval
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 

Mehr von RIILP

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD RIILP
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic RIILP
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones RIILP
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO RIILP
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT RIILP
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARRIILP
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU RIILP
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMARIILP
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW RIILP
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU RIILP
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARRIILP
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - AcclaroRIILP
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
 

Mehr von RIILP (20)

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAAR
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMA
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAAR
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - Acclaro
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
 

Kürzlich hochgeladen

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

14. Michael Oakes (UoW) Natural Language Processing for Translation

  • 1. Information Retrieval (from EBMT examples) Michael P. Oakes University of Wolverhampton Birmingham Winter School, 2013
  • 2. Finding Out About (FOA)  “Finding Out About”, by Richard Belew (2000), Cambridge University Press.  Finding Out About (FOA) - research activities that allow a decision maker to draw on others’ knowledge, especially the WWW = “Information Retrieval”.  A library (or WWW) contains many books (“documents”) on many topics. The authors are typically people far from our time or place, using language similar but not identical to our own. We must FOA a topic of special interest, looking only for those things which are relevant to our search. This basic skill is a fundamental part of an academic’s job:
  • 3. FOA has 3 phases:  asking a question  constructing an answer  assessing an answer
  • 4.
  • 5. 1. Asking a question  Users of a search engine may be aware of a specific gap in their knowledge, and are motivated to fill it (meta-cognition about ignorance). They may not be able to articulate their knowledge gap. Forming a clearly posed question is the hardest part of answering it! This common cognitive state is the user’s information need. User’s try to take their ill-defined, internal cognitive state and turn it into an external expression of their question. This external expression is called the query, and the language in which it is constructed the query language.
  • 6. 2. Constructing an answer  A human question-answerer might consider:  can they translate the user’s ill-formed query into a better one?  do they know the answer themselves?  are they able to verbalise this answer in terms the user will understand?  can they provide the necessary background knowledge for the user to understand the answer itself?  Current search engines are slightly more limited in scope. The search engine has available to it only a preexisting set of “canned” texts (although this may be very large), and its response is limited to identifying one or more of these passages and presenting them to the users.
  • 7. 3. Assessing the answer  Assessing the answer  We would normally give feedback to a human answerer, e.g. “That isn’t what I meant”, “Let me ask it another way”, “That helps, but I still have this problem” or “What does that have to do with anything?”. So we “close the loop” when the user provides an assessment of how relevant the found the answer provided. In an automatic system this is relevance feedback - the user reacts to each retrieved document as “relevant”, “not relevant” or “neutral”.  See fig 1.4. The three steps in a computerised, algorithmic context, information retrieval.
  • 8.
  • 9. Keywords  The fundamental operation performed by a search engine is a match between descriptive features mentioned by users in their queries, and documents sharing those features. By far the most important kind of features are keywords.  Keywords are linguistic atoms - typically words, pieces of words, or phrases - used to characterise the subject or content of the document.  They are pivotal because they must bridge the gap between the users’ characterisation of their information need (queries) and the characterisation of the documents’ topical focus against which these will be matched.  Contrast natural language queries with bag-of-words queries.
  • 10. Keywords as document descriptors  Keywords are also used as document descriptors.  Indexing is the process of associating one or more keywords with each document.  The vocabulary used can either be controlled or uncontrolled (also known as closed or open). If we organise a conference, and ask the authors of each paper to index it manually using only terms on a fixed list of potential keywords, this is a closed vocabulary.
  • 11. Query syntax  Query syntax. A typical search engine query consists of 2 to 3 words.  Queries defined only as sets of keywords are simple queries - most search engines use this “bag of words” approach.  Other possibilities exist e.g. Boolean operators and / or / not e.g. “neural networks AND speech recognition”.  Verb(subject, object) triples e.g. “aspirin treats blood_clotting”
  • 12. Document length  Document length. Longer documents can discuss more topics, and hence be associated with more keywords, and thus are more likely to be retrieved.  This means we must normalise documents’ indices in some way to compensate for differing lengths.  We also assume that the smallest unit of text with appreciable “aboutness” is the paragraph, and larger documents are constructed of a number of paragraphs.
  • 13. Stemming  Stemming aims to remove surface markings (such as number) to reveal a root form  Using a token’s root form as an index term can give robust retrieval even when the query contains the plural CARS while the document contains the singular CAR  Linguists distinguish inflectional morphology (plurals, third person singular, past tense, -ing) from derivational morphology (e.g. teach (verb), teacher (noun)). Weak vs. strong stemming.
  • 14. Example stemming rules  (.*)SSES  /1SS: PERL-like syntax to say that strings ending in –SSES should be transformed by taking the stem (characters before –SSES) and adding only the two characters –SS.  (.*)IES  /1Y  A complete stemmer contains many such rules (60 in Lovins’ set), and a regime for handling conflicts when multiple rules match the same token, e.g. longest match, rule order.
  • 15. Pros and Cons of Stemming  Reduces the size of the keyword vocabulary, allowing compression of the index files of 10 – 50%.  Increases recall – a query on FOOTBALL now also finds documents on FOOTBALLER(S), FOOTBALLING.  Reduces precision – stripping away morphological features may obscure differences in word meanings. For example, GRAVITY has two senses (earth’s pull, seriousness). GRAVITATION can only refer to earth’s pull – but if we stem it to GRAVITY, it could mean either.
  • 18. The cosine similarity measure 
  • 19. How well are we doing (1)?  Evaluation of search engines is notoriously difficult. However, we have two measures of objective assessment. The first step is to focus on a particular query.  We identify the set of documents Rel that are determined to be relevant to it (subjective!).  A perfect search engine would retrieve all and only the documents in Rel.  See fig. 1.10
  • 20. Recall  Clearly, the number of documents that were designated both relevant and retrieved, Retr ∩ Rel will be a key measure of success.  But we must compare the size of the set |Retr ∩ Rel| to something.  If we were very concerned that the search engine retrieve every relevant document (e.g. every prior ruling relevant to a judicial case) , we should compare the intersection to the number of documents marked as relevant |Rel|.  This measure is known as recall = |Retr ∩ Rel| / |Rel| :
  • 21. Precision  However, we might instead be worried about how much of what we see is relevant (search engine users want a lot of relevant hits on the first page), so an equally reasonable standard of comparison is precision, the proportion of retrieved documents which are in fact relevant:  P = |Retr ∩ Rel| |Retr|
  • 22. Retrieved versus Relevant Docs High Recall Retrieval Retrieved Relevant High Precision Retrieval
  • 23. Conclusion: IR for short documents like translation examples  Prior annotation of the corpus of past translations (Clifton and Teahan, 2004, QA systems)  Similarity measures for short segments of text: stemming to increase recall, then document expansion, Kullback-Leibler Divergence as a similarity measure (Metzler et al., 2007).  Tao Tao et al (2006). Need for short text matching: query/image caption, sponsored search: query/ad keyword; query reformulation: query / query similarity  Document expansion. Rohit Gupta expands the documents with all entries in the PPDB Paraphrase database: lexical, phrasal and syntactic.  Relevance feedback? (“more like this”).