SlideShare a Scribd company logo
1 of 63
The Role of Natural Language Processingin Information Retrieval Searching for Meaning in Text Tony Russell-Rose, PhD 21-Mar-2011
Contents Introduction & Overview Background, Terminology NLP Fundamentals Paradigms & Perspectives NLP Tools & Techniques NLP Applications Text mining, Question Answering, etc. Conclusions & Further Reading 21-Mar-2011 2 NLP & IR
Introduction & Overview The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 3 NLP & IR
Information Retrieval Models, techniques & frameworks for satisfying an information need Common paradigm: document retrieval 21-Mar-2011 4 NLP & IR
Document Retrieval Methods include Boolean, vector space, probabilistic Rely on index terms “bag of words” Stop lists + stemming But text is “unstructured” Information may be “hidden” 21-Mar-2011 5 NLP & IR
NLP – a solved problem? As humans we do it effortlessly ... don’t we? DRUNK GETS NINE YEARS IN VIOLIN CASE PROSTITUTES APPEAL TO POPE  STOLEN PAINTING FOUND BY TREE  RED TAPE HOLDS UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES  MINERS REFUSE TO WORK AFTER DEATH   21-Mar-2011 6 NLP & IR
Problems with Text (1) Polysemy One word maps to many concepts e.g. bat Synonymy One concept maps to many words 21-Mar-2011 7 NLP & IR
Problems with Text (2) Word order Venetian blind vs. Blind venetian 21-Mar-2011 8 NLP & IR
Problems with Text (3) Language is generative Starbucks coffee is the best The place I like most when I need to feed my caffeine addiction is the company from Seattle with branches everywhere Many different ways to express a given idea Synonymy, paraphrase, metaphor, etc. 21-Mar-2011 9 NLP & IR
Problems with Text (4) ,[object Object]
The meaning of a sentence is completely determined by the meaning of its symbols and the syntax used to combine themLanguage is a form of communication All communication has a context: time and place of utterance, the writer, the reader, their backgroundknowledge, intentions, assumptions re the reader’s knowledge/intentions, etc. 21-Mar-2011 10 NLP & IR
Problems with Text (5) Language is changing I want to buy a mobile Ill-formed input “accomodation office” Co-ordination, negation, etc. This is not a talk about neuro-linguistic programming Multi-linguality Claudia Schiffer is on the cover of Elle Also sarcasm, irony, slang, jargon, etc. That was a wicked lecture Yep – the coffee break was the best part 21-Mar-2011 11 NLP & IR
Enter NLP / Text Analytics… Text Analytics: a set of linguistic, analytical and predictive techniques to extract structure and meaning from unstructured documents NLP: the academic term for Text Analytics analogous to “search” vs. “IR” Text Analytics ~= Natural Language Processing ~= Text Mining Text Mining -> Scientific / technical context, automated processing Text Analytics -> Business context, interactive apps 21-Mar-2011 12 NLP & IR
NLP Fundamentals The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 13 NLP & IR
NLP Perspectives Computational Linguistics Use of computational techniques to study linguistic phenomena Cognitive science Study of human information processing (perception, language, reasoning, etc.) Computer science Theoretical foundations of computation and practical techniques for implementation  Information science Analysis, classification, manipulation, retrieval and dissemination of information 21-Mar-2011 14 NLP & IR
NLP Paradigms Symbolic approaches Rule-based, hand coded (by linguists) Knowledge-intensive Statistical approaches Shallow statistical models, trained on annotated corpora Data-intensive 21-Mar-2011 15 NLP & IR
NLP Fundamentals (word level) Language is AMBIGUOUS To find structure, we must remove ambiguity! Lexical analysis (tokenisation) The cat sat on the mat I can’t tokenise this sentence Stop word removal No definitive list The Who, The The, Take That… To be or not to be 21-Mar-2011 16 NLP & IR
NLP Fundamentals (word level) Stemming fishing, fished, fish, fisher -> fish Lemmatization Stemming + context Passing -> pass + ING Were -> be + PAST Delegate = de-leg-ate (?) Ratify = rat-ify (?) Morphology (prefixes, suffixes, etc.) Gebäudereinigungsfirmenangestellter -> Gebäude + Reinigung + Firma + Angestellter (building + cleaning + company + employee) 21-Mar-2011 17 NLP & IR
NLP Fundamentals (sentence level) Syntax (part of speech tagging) book -> NOUN, VERB that -> DETERMINER flight -> NOUN Book that flight -> VB DT NN Ambiguity problem Time flies like an arrow -> ? Fruit flies like a banana -> ? Eats shoots and leaves -> ? 21-Mar-2011 18 NLP & IR
NLP Fundamentals (sentence level) Parsing (grammar) I saw a venetian blind I saw a blind venetian I saw the man on the hill with a telescope Rugby is a game played by men with odd-shaped balls Sentence boundary detection Punctuation denotes the end of a sentence! “But not always!”, said Fred... 21-Mar-2011 19 NLP & IR
NLP Tools & Techniques The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 20 NLP & IR
Early NLP Systems ELIZA Wiezenbaum 1966 Simple pattern matching PARRY Colby et al 1971 Pattern matching & planning SHRDLU Winograd 1972 Natural language understanding Comprehensive grammar of English 21-Mar-2011 21 NLP & IR
Word Prediction Assistive technologies Aurora Systems, TextHelp Google, Bing, Yahoo query suggestions 21-Mar-2011 22 NLP & IR
Spelling Correction Auto-correct Did You Mean 21-Mar-2011 23 NLP & IR
Text Categorization News agencies: classify incoming news stories Search engines: classify queries, e.g. search Google for ‘LOG313’ Identifying spam emails, e.g. http://www.paulgraham.com/spam.html Routing email or documents to appropriate people 21-Mar-2011 24 NLP & IR
Terminology Extraction Differentiate between useful index terms and ‘noise’ Help lexicographers identify new terminology: the C4 carbons of the nicotinamide rings X-ray therapy vs. X-ray therapies PAHO vs. Pan-American Health Organization Term extraction systems process scientific papers to identify terminology, possibly comparing it with a known list 21-Mar-2011 25 NLP & IR
Speech Recognition Spoken Dialogue Systems Directory enquiries (AT&T) Railways (Germany, Switzerland, Italy) Airlines (USA) iPhone Voice Search IBM Watson 21-Mar-2011 26 NLP & IR
Named Entity Recognition Identification of key concepts,  e.g. people, places, organisations, etc. Also postcodes, temporal/numerical expressions, etc. “Mexico has been trying to stage a recovery since the beginning of this year and it's always been getting ahead of itself in terms of fundamentals,” said Matthew Hickman of Lehman Brothers in New York.” 21-Mar-2011 27 NLP & IR
Named Entity Recognition: Applications Increase precision of IR New companies in York vs. Companies in New York Support navigation SMART tags, Apture, etc. Improve machine translation: “I live in a new house in new york” Speech synthesis, auto-summarisation, etc. 21-Mar-2011 28 NLP & IR
Named Entity Recognition Systems usually developed for specific domain Are typically language and domain-specific Accuracy > 90%  for newswire text Best system at MUC-7 scored 93.39% f-measure human annotators scored 97.60% and 96.95%. Lower for certain domains, e.g. life sciences (species, substances, proteins, genes, etc.)  extended naming conventions, extensive use of conjunction and disjunction, high occurrence of neologisms and abbreviations, etc. 21-Mar-2011 29 NLP & IR
NER (Newssift Example) 21-Mar-2011 30 NLP & IR
Machine Translation (before) 21-Mar-2011 31 NLP & IR
Machine Translation (after) 21-Mar-2011 32 NLP & IR
Summarisation Section 3 discusses two information access applications (text mining and question answering) closely associated with NLP.  NLP Techniques Named entity recognition  Information Extraction  Current document retrieval technologies could not identify information as specific as this within text.  Word Sense Disambiguation Text Mining Question Answering  “Wh” questions List questions  Examples include those mentioned here: text mining, question answering and cross-language information retrieval.  Ponte and Croft (1998) “A Language Modeling Approach to Information Retrieval” SIGIR.   Sanderson, M. (1994) “Word sense disambiguation and information retrieval” Proceedings of the 17th ACM SIGIR Conference   Sanderson, M. (2000) “Retrieving with Good Sense” Information Retrieval 2(1):49-69.   Question Answering (Section 3.2).   Summarization single-document vs. multi-document MS Word Bing 21-Mar-2011 33 NLP & IR
Information Extraction ,[object Object]
Based on pre-defined structures, e.g.movements of company executives victims of terrorist attacks corporate mergers / acquisitions interactions between genes and proteins Example (from MUC-6): Now, Mr. James is preparing to sail into the sunset, and Mr. Dooner is poised to rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st century. Yesterday, McCann made official what had been widely anticipated: Mr. James, 57 years old, is stepping down as chief executive officer on July 1 and will retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45. 21-Mar-2011 34 NLP & IR
Information Extraction <SUCCESSION_EVENT-2> :=     SUCCESSION_ORG:          <ORGANIZATION-1>     POST: "chairman"     IN_AND_OUT: <IN_AND_OUT-4>     VACANCY_REASON: DEPART_WORKFORCE   <IN_AND_OUT-4> :=     IO_PERSON: <PERSON-1>     NEW_STATUS: IN     ON_THE_JOB: NO     OTHER_ORG: <ORGANIZATION-1>     REL_OTHER_ORG: SAME_ORG   <ORGANIZATION-1> :=     ORG_NAME: "McCann-Erickson"     ORG_ALIAS: "McCann"     ORG_TYPE:  COMPANY   <PERSON-1> :=     PER_NAME: "John J. Dooner Jr."     PER_ALIAS: "John Dooner" "Dooner" 21-Mar-2011 35 NLP & IR
Information Extraction Can now use as metadata (for retrieval) Or store in DB and query against it, e.g. “show me all the events where a finance officer left a company to take up the position of CEO in another company” 21-Mar-2011 36 NLP & IR
IE Example: Yahoo Correlator 21-Mar-2011 37 NLP & IR
IE Example: Google Wonder Wheel 21-Mar-2011 38 NLP & IR
IE Example: Yahoo Search Assist 21-Mar-2011 39 NLP & IR
Information Extraction But IE is difficult when concepts are spread across multiple sentences: Pace American Group Inc. said it notified two top executives it intends to dismiss them because an internal investigation found evidence of “self-dealing” and “undisclosed financial relationships.” The executives are Don H. Pace, president and CEO; and Greg S. Kaplan, SVP and CFO. Event & org in first S, but names in second S Need to identify connection between “two top executives”  and “the executives”. 21-Mar-2011 40 NLP & IR
Information Extraction Anaphora resolution relies on context (semantic / pragmatic knowledge): We gave the bananas to the monkeys because they were hungry. We gave the bananas to the monkeys because they were ripe. We gave the bananas to the monkeys because they were here.  21-Mar-2011 41 NLP & IR
WordNet Semantic database for English  See also EuroWordNet Based on synsets car = automobile | railway car | elevator car | cable car 1stsynset = car, motorcar, machine, auto, automobile  Connected via relationships E.g. hypernymy (a kind of) separate hierarchies for nouns, verbs, adjectives and adverbs 21-Mar-2011 42 NLP & IR
Word Sense Disambiguation Resolve ambiguities due to polysemy Often characterised by syntax, e.g. Magnesium is a light metal (ADJ) The light in the kitchen is quite bright (NOUN) Can use a Part of Speech Tagger to resolve broad ambiguities PoS tags = NN, NNP, NNS, NNPS, etc. 21-Mar-2011 43 NLP & IR
Dictionary Definition of “set” set    /sɛt/ Show Spelled [set] Show IPA verb, set, set·ting, noun, adjective, interjection  –verb (used with object) 1. to put (something or someone) in a particular place: to set a vase on a table.  2. to place in a particular position or posture: Set thebaby on his feet.  3. to place in some relation to something or someone: We set a supervisor over the new workers.  4. to put into some condition: to set a house on fire.  5. to put or apply: to set fire to a house.  6. to put in the proper position: to set a chair back on its feet.  7. to put in the proper or desired order or condition for use: to set a trap.  8. to distribute or arrange china, silver, etc., for use on (a table): to set the table for dinner.  9. to place (the hair, especially when wet) on rollers, in clips, or the like, so that the hair will assume a particular style.  10. to put (a price or value) upon something: He set $7500 as the right amount for the car. The teacher sets a high value on neatness.  11. to fix the value of at a certain amount or rate; value: He set the car at $500. She sets neatness at a high value.  12. to post, station, or appoint for the purpose of performing some duty: to set spies on a person.  13. to determine or fix definitely: to set a time limit.  14. to resolve or decide upon: to set a wedding date.  15. to cause to pass into a given state or condition: to set one's mind at rest; to set a prisoner free.  16. to direct or settle resolutely or wishfully: to set one's mind to a task.  17. to present as a model; place before others as a standard: to set a good example.  18. to establish for others to follow: to set a fast pace.  19. to prescribe or assign, as a task.  20. to adjust (a mechanism) so as to control its performance.  ,[object Object]
22. to adjust (a timer, alarm of a clock, etc.) so as to sound when desired: He set the alarm for seven o'clock.
23. to fix or mount (a gem or the like) in a frame or setting.
24. to ornament or stud with gems or the like: a bracelet set with pearls.
25. to cause to sit; seat: to set a child in a highchair.
26. to put (a hen) on eggs to hatch them.
27. to place (eggs) under a hen or in an incubator for hatching.
28. to place or plant firmly: to set a flagpole in concrete.
29. to put into a fixed, rigid, or settled state, as the face, muscles, etc.
30. to fix at a given point or calibration: to set the dial on an oven; to set a micrometer.
31. to tighten (often followed by up ): to set nuts well up.
32. to cause to take a particular direction: to set one's course to the south.
33. Surgery . to put (a broken or dislocated bone) back in position.
34. (of a hunting dog) to indicate the position of (game) by standing stiffly and pointing with the muzzle.
35. Music . a. to fit, as words to music.
b. to arrange for musical performance.
c. to arrange (music) for certain voices or instruments.
36. Theater . a. to arrange the scenery, properties, lights, etc., on (a stage) for an act or scene.

More Related Content

What's hot

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseRAKESH P
 

What's hot (20)

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 

Viewers also liked

Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Basic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji marketBasic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji marketChetan Kute
 
India food processing overview
India food processing overviewIndia food processing overview
India food processing overviewYuvraj Kadam
 
Food Processing Industry India
Food Processing Industry IndiaFood Processing Industry India
Food Processing Industry IndiaAnkit Agarwal
 
Food processing industry.
Food processing industry.Food processing industry.
Food processing industry.Rachana Tiwari
 

Viewers also liked (6)

Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Basic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji marketBasic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji market
 
India food processing overview
India food processing overviewIndia food processing overview
India food processing overview
 
Indian Food Processing Sector_2016
Indian Food Processing Sector_2016Indian Food Processing Sector_2016
Indian Food Processing Sector_2016
 
Food Processing Industry India
Food Processing Industry IndiaFood Processing Industry India
Food Processing Industry India
 
Food processing industry.
Food processing industry.Food processing industry.
Food processing industry.
 

Similar to The Role of Natural Language Processing in Information Retrieval

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYA DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYijaia
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...write4
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...write5
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011Seth Grimes
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 

Similar to The Role of Natural Language Processing in Information Retrieval (20)

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Nltk
NltkNltk
Nltk
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYA DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP
NLPNLP
NLP
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
Nlp
NlpNlp
Nlp
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 

More from Tony Russell-Rose

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrievalTony Russell-Rose
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional SearchTony Russell-Rose
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasetsTony Russell-Rose
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestionsTony Russell-Rose
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationTony Russell-Rose
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingTony Russell-Rose
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryTony Russell-Rose
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategiesTony Russell-Rose
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcareTony Russell-Rose
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourTony Russell-Rose
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsTony Russell-Rose
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsTony Russell-Rose
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowTony Russell-Rose
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryTony Russell-Rose
 

More from Tony Russell-Rose (17)

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrieval
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional Search
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasets
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestions
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulation
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of Discovery
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategies
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcare
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search Behaviour
 
A Taxonomy of Site Search
A Taxonomy of Site SearchA Taxonomy of Site Search
A Taxonomy of Site Search
 
Patterns of Personalization
Patterns of PersonalizationPatterns of Personalization
Patterns of Personalization
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutions
 
From Search to Discovery
From Search to DiscoveryFrom Search to Discovery
From Search to Discovery
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information Discovery
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

The Role of Natural Language Processing in Information Retrieval

  • 1. The Role of Natural Language Processingin Information Retrieval Searching for Meaning in Text Tony Russell-Rose, PhD 21-Mar-2011
  • 2. Contents Introduction & Overview Background, Terminology NLP Fundamentals Paradigms & Perspectives NLP Tools & Techniques NLP Applications Text mining, Question Answering, etc. Conclusions & Further Reading 21-Mar-2011 2 NLP & IR
  • 3. Introduction & Overview The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 3 NLP & IR
  • 4. Information Retrieval Models, techniques & frameworks for satisfying an information need Common paradigm: document retrieval 21-Mar-2011 4 NLP & IR
  • 5. Document Retrieval Methods include Boolean, vector space, probabilistic Rely on index terms “bag of words” Stop lists + stemming But text is “unstructured” Information may be “hidden” 21-Mar-2011 5 NLP & IR
  • 6. NLP – a solved problem? As humans we do it effortlessly ... don’t we? DRUNK GETS NINE YEARS IN VIOLIN CASE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH   21-Mar-2011 6 NLP & IR
  • 7. Problems with Text (1) Polysemy One word maps to many concepts e.g. bat Synonymy One concept maps to many words 21-Mar-2011 7 NLP & IR
  • 8. Problems with Text (2) Word order Venetian blind vs. Blind venetian 21-Mar-2011 8 NLP & IR
  • 9. Problems with Text (3) Language is generative Starbucks coffee is the best The place I like most when I need to feed my caffeine addiction is the company from Seattle with branches everywhere Many different ways to express a given idea Synonymy, paraphrase, metaphor, etc. 21-Mar-2011 9 NLP & IR
  • 10.
  • 11. The meaning of a sentence is completely determined by the meaning of its symbols and the syntax used to combine themLanguage is a form of communication All communication has a context: time and place of utterance, the writer, the reader, their backgroundknowledge, intentions, assumptions re the reader’s knowledge/intentions, etc. 21-Mar-2011 10 NLP & IR
  • 12. Problems with Text (5) Language is changing I want to buy a mobile Ill-formed input “accomodation office” Co-ordination, negation, etc. This is not a talk about neuro-linguistic programming Multi-linguality Claudia Schiffer is on the cover of Elle Also sarcasm, irony, slang, jargon, etc. That was a wicked lecture Yep – the coffee break was the best part 21-Mar-2011 11 NLP & IR
  • 13. Enter NLP / Text Analytics… Text Analytics: a set of linguistic, analytical and predictive techniques to extract structure and meaning from unstructured documents NLP: the academic term for Text Analytics analogous to “search” vs. “IR” Text Analytics ~= Natural Language Processing ~= Text Mining Text Mining -> Scientific / technical context, automated processing Text Analytics -> Business context, interactive apps 21-Mar-2011 12 NLP & IR
  • 14. NLP Fundamentals The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 13 NLP & IR
  • 15. NLP Perspectives Computational Linguistics Use of computational techniques to study linguistic phenomena Cognitive science Study of human information processing (perception, language, reasoning, etc.) Computer science Theoretical foundations of computation and practical techniques for implementation Information science Analysis, classification, manipulation, retrieval and dissemination of information 21-Mar-2011 14 NLP & IR
  • 16. NLP Paradigms Symbolic approaches Rule-based, hand coded (by linguists) Knowledge-intensive Statistical approaches Shallow statistical models, trained on annotated corpora Data-intensive 21-Mar-2011 15 NLP & IR
  • 17. NLP Fundamentals (word level) Language is AMBIGUOUS To find structure, we must remove ambiguity! Lexical analysis (tokenisation) The cat sat on the mat I can’t tokenise this sentence Stop word removal No definitive list The Who, The The, Take That… To be or not to be 21-Mar-2011 16 NLP & IR
  • 18. NLP Fundamentals (word level) Stemming fishing, fished, fish, fisher -> fish Lemmatization Stemming + context Passing -> pass + ING Were -> be + PAST Delegate = de-leg-ate (?) Ratify = rat-ify (?) Morphology (prefixes, suffixes, etc.) Gebäudereinigungsfirmenangestellter -> Gebäude + Reinigung + Firma + Angestellter (building + cleaning + company + employee) 21-Mar-2011 17 NLP & IR
  • 19. NLP Fundamentals (sentence level) Syntax (part of speech tagging) book -> NOUN, VERB that -> DETERMINER flight -> NOUN Book that flight -> VB DT NN Ambiguity problem Time flies like an arrow -> ? Fruit flies like a banana -> ? Eats shoots and leaves -> ? 21-Mar-2011 18 NLP & IR
  • 20. NLP Fundamentals (sentence level) Parsing (grammar) I saw a venetian blind I saw a blind venetian I saw the man on the hill with a telescope Rugby is a game played by men with odd-shaped balls Sentence boundary detection Punctuation denotes the end of a sentence! “But not always!”, said Fred... 21-Mar-2011 19 NLP & IR
  • 21. NLP Tools & Techniques The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 20 NLP & IR
  • 22. Early NLP Systems ELIZA Wiezenbaum 1966 Simple pattern matching PARRY Colby et al 1971 Pattern matching & planning SHRDLU Winograd 1972 Natural language understanding Comprehensive grammar of English 21-Mar-2011 21 NLP & IR
  • 23. Word Prediction Assistive technologies Aurora Systems, TextHelp Google, Bing, Yahoo query suggestions 21-Mar-2011 22 NLP & IR
  • 24. Spelling Correction Auto-correct Did You Mean 21-Mar-2011 23 NLP & IR
  • 25. Text Categorization News agencies: classify incoming news stories Search engines: classify queries, e.g. search Google for ‘LOG313’ Identifying spam emails, e.g. http://www.paulgraham.com/spam.html Routing email or documents to appropriate people 21-Mar-2011 24 NLP & IR
  • 26. Terminology Extraction Differentiate between useful index terms and ‘noise’ Help lexicographers identify new terminology: the C4 carbons of the nicotinamide rings X-ray therapy vs. X-ray therapies PAHO vs. Pan-American Health Organization Term extraction systems process scientific papers to identify terminology, possibly comparing it with a known list 21-Mar-2011 25 NLP & IR
  • 27. Speech Recognition Spoken Dialogue Systems Directory enquiries (AT&T) Railways (Germany, Switzerland, Italy) Airlines (USA) iPhone Voice Search IBM Watson 21-Mar-2011 26 NLP & IR
  • 28. Named Entity Recognition Identification of key concepts, e.g. people, places, organisations, etc. Also postcodes, temporal/numerical expressions, etc. “Mexico has been trying to stage a recovery since the beginning of this year and it's always been getting ahead of itself in terms of fundamentals,” said Matthew Hickman of Lehman Brothers in New York.” 21-Mar-2011 27 NLP & IR
  • 29. Named Entity Recognition: Applications Increase precision of IR New companies in York vs. Companies in New York Support navigation SMART tags, Apture, etc. Improve machine translation: “I live in a new house in new york” Speech synthesis, auto-summarisation, etc. 21-Mar-2011 28 NLP & IR
  • 30. Named Entity Recognition Systems usually developed for specific domain Are typically language and domain-specific Accuracy > 90% for newswire text Best system at MUC-7 scored 93.39% f-measure human annotators scored 97.60% and 96.95%. Lower for certain domains, e.g. life sciences (species, substances, proteins, genes, etc.) extended naming conventions, extensive use of conjunction and disjunction, high occurrence of neologisms and abbreviations, etc. 21-Mar-2011 29 NLP & IR
  • 31. NER (Newssift Example) 21-Mar-2011 30 NLP & IR
  • 32. Machine Translation (before) 21-Mar-2011 31 NLP & IR
  • 33. Machine Translation (after) 21-Mar-2011 32 NLP & IR
  • 34. Summarisation Section 3 discusses two information access applications (text mining and question answering) closely associated with NLP. NLP Techniques Named entity recognition Information Extraction Current document retrieval technologies could not identify information as specific as this within text. Word Sense Disambiguation Text Mining Question Answering “Wh” questions List questions Examples include those mentioned here: text mining, question answering and cross-language information retrieval. Ponte and Croft (1998) “A Language Modeling Approach to Information Retrieval” SIGIR.   Sanderson, M. (1994) “Word sense disambiguation and information retrieval” Proceedings of the 17th ACM SIGIR Conference   Sanderson, M. (2000) “Retrieving with Good Sense” Information Retrieval 2(1):49-69.   Question Answering (Section 3.2).   Summarization single-document vs. multi-document MS Word Bing 21-Mar-2011 33 NLP & IR
  • 35.
  • 36. Based on pre-defined structures, e.g.movements of company executives victims of terrorist attacks corporate mergers / acquisitions interactions between genes and proteins Example (from MUC-6): Now, Mr. James is preparing to sail into the sunset, and Mr. Dooner is poised to rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st century. Yesterday, McCann made official what had been widely anticipated: Mr. James, 57 years old, is stepping down as chief executive officer on July 1 and will retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45. 21-Mar-2011 34 NLP & IR
  • 37. Information Extraction <SUCCESSION_EVENT-2> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "chairman" IN_AND_OUT: <IN_AND_OUT-4> VACANCY_REASON: DEPART_WORKFORCE   <IN_AND_OUT-4> := IO_PERSON: <PERSON-1> NEW_STATUS: IN ON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-1> REL_OTHER_ORG: SAME_ORG   <ORGANIZATION-1> := ORG_NAME: "McCann-Erickson" ORG_ALIAS: "McCann" ORG_TYPE: COMPANY   <PERSON-1> := PER_NAME: "John J. Dooner Jr." PER_ALIAS: "John Dooner" "Dooner" 21-Mar-2011 35 NLP & IR
  • 38. Information Extraction Can now use as metadata (for retrieval) Or store in DB and query against it, e.g. “show me all the events where a finance officer left a company to take up the position of CEO in another company” 21-Mar-2011 36 NLP & IR
  • 39. IE Example: Yahoo Correlator 21-Mar-2011 37 NLP & IR
  • 40. IE Example: Google Wonder Wheel 21-Mar-2011 38 NLP & IR
  • 41. IE Example: Yahoo Search Assist 21-Mar-2011 39 NLP & IR
  • 42. Information Extraction But IE is difficult when concepts are spread across multiple sentences: Pace American Group Inc. said it notified two top executives it intends to dismiss them because an internal investigation found evidence of “self-dealing” and “undisclosed financial relationships.” The executives are Don H. Pace, president and CEO; and Greg S. Kaplan, SVP and CFO. Event & org in first S, but names in second S Need to identify connection between “two top executives” and “the executives”. 21-Mar-2011 40 NLP & IR
  • 43. Information Extraction Anaphora resolution relies on context (semantic / pragmatic knowledge): We gave the bananas to the monkeys because they were hungry. We gave the bananas to the monkeys because they were ripe. We gave the bananas to the monkeys because they were here. 21-Mar-2011 41 NLP & IR
  • 44. WordNet Semantic database for English See also EuroWordNet Based on synsets car = automobile | railway car | elevator car | cable car 1stsynset = car, motorcar, machine, auto, automobile Connected via relationships E.g. hypernymy (a kind of) separate hierarchies for nouns, verbs, adjectives and adverbs 21-Mar-2011 42 NLP & IR
  • 45. Word Sense Disambiguation Resolve ambiguities due to polysemy Often characterised by syntax, e.g. Magnesium is a light metal (ADJ) The light in the kitchen is quite bright (NOUN) Can use a Part of Speech Tagger to resolve broad ambiguities PoS tags = NN, NNP, NNS, NNPS, etc. 21-Mar-2011 43 NLP & IR
  • 46.
  • 47. 22. to adjust (a timer, alarm of a clock, etc.) so as to sound when desired: He set the alarm for seven o'clock.
  • 48. 23. to fix or mount (a gem or the like) in a frame or setting.
  • 49. 24. to ornament or stud with gems or the like: a bracelet set with pearls.
  • 50. 25. to cause to sit; seat: to set a child in a highchair.
  • 51. 26. to put (a hen) on eggs to hatch them.
  • 52. 27. to place (eggs) under a hen or in an incubator for hatching.
  • 53. 28. to place or plant firmly: to set a flagpole in concrete.
  • 54. 29. to put into a fixed, rigid, or settled state, as the face, muscles, etc.
  • 55. 30. to fix at a given point or calibration: to set the dial on an oven; to set a micrometer.
  • 56. 31. to tighten (often followed by up ): to set nuts well up.
  • 57. 32. to cause to take a particular direction: to set one's course to the south.
  • 58. 33. Surgery . to put (a broken or dislocated bone) back in position.
  • 59. 34. (of a hunting dog) to indicate the position of (game) by standing stiffly and pointing with the muzzle.
  • 60. 35. Music . a. to fit, as words to music.
  • 61. b. to arrange for musical performance.
  • 62. c. to arrange (music) for certain voices or instruments.
  • 63. 36. Theater . a. to arrange the scenery, properties, lights, etc., on (a stage) for an act or scene.
  • 64. b. to prepare (a scene) for dramatic performance.
  • 65. 37. Nautical . to spread and secure (a sail) so as to catch the wind.
  • 66. 38. Printing . a. to arrange (type) in the order required for printing.
  • 67. b. to put together types corresponding to (copy); compose in type: to set an article.
  • 68. 39. Baking . to put aside (a substance to which yeast has been added) in order that it may rise.
  • 69. 40. to change into curd: to set milk with rennet.
  • 70. 41. to cause (glue, mortar, or the like) to become fixed or hard.
  • 71. 42. to urge, goad, or encourage to attack: to set the hounds on a trespasser.
  • 72. 43. Bridge . to cause (the opposing partnership or their contract) to fall short: We set them two tricks at four spades. Only perfect defense could set four spades.
  • 73. 44. to affix or apply, as by stamping: The king set his seal to the decree.
  • 74. 45. to fix or engage (a fishhook) firmly into the jaws of a fish by pulling hard on the line once the fish has taken the bait.
  • 75. 46. to sharpen or put a keen edge on (a blade, knife, razor, etc.) by honing or grinding.
  • 76. 47. to fix the length, width, and shape of (yarn, fabric, etc.).
  • 77. 48. Carpentry . to sink (a nail head) with a nail set.
  • 78. 49. to bend or form to the proper shape, as a saw tooth or a spring.
  • 79. 50. to bend the teeth of (a saw) outward from the blade alternately on both sides in order to make a cut wider than the blade itself.
  • 80. –verb (used without object) 51. to pass below the horizon; sink: The sun sets early in winter.
  • 82. 53. to assume a fixed or rigid state, as the countenance or the muscles.
  • 83. 54. (of the hair) to be placed temporarily on rollers, in clips, or the like, in order to assume a particular style: Long hair sets more easily than short hair.
  • 84. 55. to become firm, solid, or permanent, as mortar, glue, cement, or a dye, due to drying or physical or chemical change.
  • 85. 56. to sit on eggs to hatch them, as a hen.
  • 86. 57. to hang or fit, as clothes.
  • 87. 58. to begin to move; start (usually followed by forth, out, off,  etc.).
  • 88. 59. (of a flower's ovary) to develop into a fruit.
  • 89. 60. (of a hunting dog) to indicate the position of game. 21-Mar-2011 44 NLP & IR
  • 90.
  • 91. 77. an apparatus for receiving radio or television programs; receiver.
  • 92. 78. Philately . a group of stamps that form a complete series.
  • 93. 79. Tennis . a unit of a match, consisting of a group of not fewer than sixgames with a margin of at least two games between the winner and loser: He won the match in straight sets of 6–3, 6–4, 6–4.
  • 94. 80. a construction representing a place or scene in which the action takes place in a stage, motion-picture, or television production.
  • 95. 81. Machinery . a. the bending out of the points of alternate teeth of a saw in opposite directions.
  • 96. b. a permanent deformation or displacement of an object or part.
  • 97. c. a tool for giving a certain form to something, as a saw tooth.
  • 98. 82. a chisel having a wide blade for dividing bricks.
  • 99. 83. Horticulture . a young plant, or a slip, tuber, or the like, suitable for planting.
  • 100. 84. Dance . a. the number of couples required to execute a quadrille or the like.
  • 101. b. a series of movements or figures that make up a quadrille or the like.
  • 102. 85. Music . a. a group of pieces played by a band, as in a night club, and followed by an intermission.
  • 103. b. the period during which these pieces are played.
  • 104. 86. Bridge . a failure to take the number of tricks specified by one's contract: Our being vulnerable made the set even more costly.
  • 105. 87. Nautical . a. the direction of a wind, current, etc.
  • 106. b. the form or arrangement of the sails, spars, etc., of a vessel.
  • 107. c. suit ( def. 12 ) .
  • 108. 88. Psychology . a temporary state of an organism characterized by a readiness to respond to certain stimuli in a specific way.
  • 109. 89. Mining . a timber frame bracing or supporting the walls or roof of a shaft or stope.
  • 110. 90. Carpentry . nail set.
  • 111. 90. Carpentry . nail set.
  • 112. 91. Mathematics . a collection of objects or elements classed together.
  • 113. 92. Printing . the width of a body of type.
  • 114. 93. sett ( def. 3 ) .
  • 115. –adjective 94. fixed or prescribed beforehand: a set time; set rules.
  • 116. 95. specified; fixed: The hall holds a set number of people.
  • 117. 96. deliberately composed; customary: set phrases.
  • 118. 97. fixed; rigid: a set smile.
  • 119. 98. resolved or determined; habitually or stubbornly fixed: to be set in one's opinions.
  • 120. 99. completely prepared; ready: Is everyone set?
  • 121. –interjection 100. (in calling the start of a race): Ready! Set! Go! Also, get set! 21-Mar-2011 45 NLP & IR
  • 122. Word Sense Disambiguation Surely PoS tagging would help IR performance? Could index docs using words AND PoS tags Very sensitive to tagger performance Does WSD improve IR performance? Conflicting evidence …“Perfect” WSD engine may provide only marginal gain, e.g. >76% accuracy needed for homonymy >55% for polysemy 21-Mar-2011 46 NLP & IR
  • 123. NLP Applications The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 47 NLP & IR
  • 124. NLP Applications Web & enterprise search Text classification Text summarisation Machine translation Human-computer interfaces (spelling correction etc.) Speech recognition & synthesis Natural language generation (e.g. in games) Text mining Question answering 21-Mar-2011 48 NLP & IR
  • 125. Finally becoming mainstream? ‘80% of corporate information is unstructured’ Entire value chain for some organisation (media / publishing etc.) Retail / eCommerce: Product reviews User generated content: blogs, forums, wikis Voice of the Customer: social media + sentiment analysis 161 billion gigabytes of digital information in 2006 approximately 988 exabytes by 2010 Audio / video still needs summaries & tags etc. 21-Mar-2011 49 NLP & IR
  • 126. Market Outlook Combine component technologies e.g. PoS tagging + NER, etc. Healthy growth: massive expansion in social media Lightweight NLP for buzz analysis, brand monitoring, etc. Dominant markets: Customer Experience (VoC), media/publishing, financial services & insurance, intelligence, life sciences, e-discovery Solutions still not standardized Need for self-service tuning & configuration Partner ecosystem developing Marketing services providers, platform vendors, CRM + call centre vendors, system integrators 21-Mar-2011 50 NLP & IR
  • 127. Text Mining Analogy with Data Mining Discover or infer new knowledge from unstructured text resources A <-> B and B <-> C Infer A <-> C? e.g. link between migraine headaches and magnesium deficiency Applications in life sciences, media/publishing, counter terrorism and competitive intelligence 21-Mar-2011 51 NLP & IR
  • 128. Text Mining Levels of text mining, e.g. for bioscience: Tagging entities: e.g. genes, proteins, diseases and chemical compounds Information extraction: identify meaningful structural relations within the text (e.g. interactions between proteins) Knowledge discovery: combine with other knowledge sources (such as external ontologies) to infer or discover new knowledge 21-Mar-2011 52 NLP & IR
  • 129. Question Answering Going beyond the document retrieval paradigm provide specific answers to specific questions Introduced as a TREC track in 1999 21-Mar-2011 53 NLP & IR
  • 130. Question Answering Yes/no questions “Is George W. Bush the current president of the USA?” “Is the Sea of Tranquility deep?” “Wh” questions “Who was the British Prime Minister before Margaret Thatcher?” “When was the Battle of Hastings?” List questions “Which football teams have won the Champions League this decade?” “Which roads lead to Rome?” Instruction-based questions “How do I cook lasagne?”, “What is the best way to build a bridge?” Explanation questions “Why did World War I start?” “How does a computer process floating point numbers?” Commands “Tell me the height of the Eiffel Tower.” “Name all the Kings of England” 21-Mar-2011 54 NLP & IR
  • 131. Question Answering Common use cases handled by major search engines Google, Yahoo!, Bing, Baidu, ... Specialist QA systems Wolfram Alpha: http://www.wolframalpha.com True Knowledge: http://www.trueknowledge.com Powerset: http://www.powerset.com/ IBM Watson 21-Mar-2011 55 NLP & IR
  • 132. Question Answering Basic approach: question analysis Predict type of answer expected Create query document retrieval answer extraction Based on output of (1) Varies from regex to deep linguistic analysis 21-Mar-2011 56 NLP & IR
  • 133. Evaluation How do we measure NLP accuracy? Compare against human annotation However: Humans often disagree Particularly for complex tasks Comparison can be measured in many ways “Bill Gates is chairman of Microsoft” -> “Gates” = person? 21-Mar-2011 57 NLP & IR
  • 134. Conclusions The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 58 NLP & IR
  • 135. Conclusions NLP’s contribution to IR not always apparent Overemphasis on precision & recall? NLP provides benefits that go beyond quantitative metrics to support broader tasks & richer experiences Insight, analysis, visualisation, etc. NLP required for text mining, question answering & cross-language IR Bag of words model is insufficient Web becoming increasingly multi-lingual 21-Mar-2011 59 NLP & IR
  • 136. State of the Art Commodity technologies: Tokenisation, normalization, regular expression search Mainstream, but room for improvement: Lemmatization, spelling correction, speech synthesis Mainstream but needs improvement: PoS tagging, term extraction, summarization, speech recognition, text categorization, spoken dialogue systems Almost there: Information extraction, NL generation, simple speech translation Don’t hold your breath: Full machine translation, advanced dialogue systems 21-Mar-2011 60 NLP & IR
  • 137. Platforms & Toolkits Many commercialvendors Open source/access: GATE (Sheffield University) RapidMiner Stanford NLP OpenNLP OpenCalais LingPipe nltk 21-Mar-2011 61 NLP & IR
  • 138. Further Reading A. Goker & J. Davies, Information Retrieval: Searching in the 21st Century. Wiley-Blackwell, 2009 D. Jurafsky and J. Martin, Speech and Language Processing. Prentice-Hall, 2nd edition, 2008. Manning, C. and Schutze, H. Foundations of Statistical Language Processing. MIT Press, 1999. 21-Mar-2011 62 NLP & IR
  • 139. Thank you Tony Russell-Rose, PhD Vice-chair, BCS IRSG Chair, IEHF HCI Group Email: tgr@uxlabs.co.uk Blog: http://isquared.wordpress.com LinkedIn: http://www.linkedin.com/tonyrussellrose/ Twitter: @tonygrr

Editor's Notes

  1. Exercise: experiment with the 3 major search engines (Google, Bing, Yahoo) to see how they handle stop words. Try queries that are dominated by stop words, with and without quotes. How do they differ?
  2. Exercise: experiment with the 3 major search engines (Google, Bing, Yahoo) to see how they handle stemming. Try queries that are dominated by inflected forms, with and without quotes. How do they differ?
  3. Do as class exercise
  4. Demo: ELIZA http://nlp-addiction.com/eliza/
  5. Demo Google suggest
  6. Demo Auto-correct &amp; DYM
  7. Demo: try Google Translate (or Babelfish) with sentences containing ambiguous terms or NEs, e.g. “I live in a new house in new york”. Use the speech synthesizerhttp://translate.google.com/#en|de|I%20live%20in%20a%20new%20house%20in%20new%20york
  8. Demo Yahoo search assist + “explore concepts”
  9. Paired exercise: what is the most polysemous English word you can think of? How many senses do you think it has in an average dictionary? Think of some polysemous words (examples include “ball”, “bank”, “port” but there are many others). Use these as queries in a search engine and comment on what you notice about the results.
  10. Paired Exercise: try each of above queries in Google, Yahoo, Bing. Now try to answer the same questions with a normal web search (by entering a set of query terms instead of a natural language question). Are the results of this search any better or worse?
  11. Paired exercise: try previous queries on these sites