SlideShare a Scribd company logo
1 of 44
Dr.Tony Russell-Rose, FBCS CITP CEng
Tony Russell-Rose, PhD
Director, UXLabs
 Led applied research & technology innovation at Microsoft, Canon, Reuters, BT
Labs, HP Labs, Endeca
 Visiting Professor of Cognitive Computing & AI, Essex University. Founder of
Search Solutions conference series
 Founder, 2dSearch: next generation advanced search
 PhD (natural language interfaces), MSc in human-computer interaction, first
degree in engineering (human factors)
 5 patents for search user interface design. 80+ scientific and technical papers
on AI/NLP, information retrieval and user experience
 Honorary Visiting Fellow at City University London (Centre for Interactive
Systems Research)
 Blog: http://isquared.wordpress.com
Industry Expertise
 Media & Publishing
 Healthcare
 Business Intelligence
 Financial Services
 eCommerce
Domain Expertise
 UX research
 Information architecture
 Search & information
retrieval
 AI + machine learning
2Intro to Natural Language Processing
 Course outline
 Background & terminology
 Fundamentals
 Techniques & applications
 Further reading
16-Jan-19 Intro to Natural Language Processing 3
16-Jan-19 Intro to Natural Language Processing 4
Week Lecture Practical Topic
1 14-Jan-2019 16-Jan-2019 Intro
2 21-Jan-2019 23-Jan-2019 Basic text processing
3 28-Jan-2019 30-Jan-2019 Language modelling
4 4-Feb-2019 6-Feb-2019 Text classification
5 11-Feb-2019 13-Feb-2019 Sentiment analysis
6 18-Feb-2019 20-Feb-2019 POS tagging
7 25-Feb-2019 27-Feb-2019 Parsing
8 4-Mar-2019 6-Mar-2019 Information Extraction
11-Mar-2019 13-Mar-2019
9 18-Mar-2019 20-Mar-2019 Information retrieval
10 25-Mar-2019 27-Mar-2019 Lexical semantics
11 1-Apr-2019 3-Apr-2019 Word embeddings
12 8-Apr-2019 10-Apr-2019 Question answering
13 15-Apr-2019 17-Apr-2019 Summarization
22-Apr-2019 24-Apr-2019
14 29-Apr-2019 EXAMS Revision
 As humans we do it effortlessly ... don’t we?
 DRUNK GETS NINEYEARS INVIOLIN CASE
 PROSTITUTESAPPEALTO POPE
 STOLEN PAINTING FOUND BYTREE
 REDTAPE HOLDS UP NEW BRIDGE
 DEER KILL 300,000
 RESIDENTS CAN DROP OFFTREES
 INCLUDE CHILDRENWHEN BAKING COOKIES
 MINERS REFUSETOWORK AFTER DEATH
16-Jan-19 Intro to Natural Language Processing 5
 Polysemy
 One word maps to
many concepts
 e.g. bat
 Synonymy
 One concept maps to
many words
16-Jan-19 Intro to Natural Language Processing 6
 Word order
 Venetian blind vs.
Blind venetian
16-Jan-19 Intro to Natural Language Processing 7
 Language is generative
 Starbucks coffee is the best
 The place I like most when I need to feed my caffeine
addiction is the company from Seattle with branches
everywhere
 Many different ways to express a given idea
 Synonymy, paraphrase, metaphor, etc.
16-Jan-19 Intro to Natural Language Processing 8
 Language is changing
 I want to buy a mobile
 Ill-formed input
 “accomodation office”
 Co-ordination, negation, etc.
 This is not a talk about neuro-linguistic programming
 Multi-linguality
 Claudia Schiffer is on the cover of Elle
 Sarcasm, irony, slang, jargon, etc.
 That was a wicked lecture
 Yep – the coffee break was the best part
16-Jan-19 Intro to Natural Language Processing 9
Dan Jurafsky
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
segmentation issues idioms
dark horse
get cold feet
lose face
throw in the towel
neologisms
unfriend
Retweet
bromance
tricky entity names
Where is A Bug’s Life playing …
Let It Be was recorded …
… a mutation on the for gene …
world knowledge
Mary and Sue are sisters.
Mary and Sue are mothers.
But that’s what makes it fun!
the New York-New Haven Railroad
the New York-New Haven Railroad
Why else is natural language
understanding difficult?
 Text Analytics: a set of linguistic, analytical and
predictive techniques to extract structure and meaning
from unstructured documents
 NLP: language can be spoken as well as written
 Text Analytics ~= Natural Language Processing ~=Text Mining
16-Jan-19 Intro to Natural Language Processing 11
 Computational Linguistics
 Use of computational techniques to study linguistic
phenomena
 Cognitive science
 Study of human information processing (perception,
language, reasoning, etc.)
 Computer science
 Theoretical foundations of computation and practical
techniques for implementation
 Information science
 Analysis, classification, manipulation, retrieval and
dissemination of information
16-Jan-19 Intro to Natural Language Processing 12
 Symbolic approaches
 Rule-based, hand coded (by linguists/SMEs)
 Knowledge-intensive
 Statistical approaches
 Distributional & neural approaches, supervised or
unsupervised
 Data-intensive
16-Jan-19 Intro to Natural Language Processing 13
 Language is AMBIGUOUS
 To determine structure, we must resolve ambiguity!
 Lexical analysis (tokenisation)
 The cat sat on the mat
 I can’t tokenise this sentence
 Stop word removal
 No definitive list
 The Who, The The, Take That…
 To be or not to be
16-Jan-19 Intro to Natural Language Processing 14
 Stemming
 fishing, fished, fish, fisher -> fish
 Lemmatization
 Linguistically principled analysis
 Passing -> pass + ING
 Were -> be + PAST
 Delegate = de-leg-ate (?)
 Ratify = rat-ify (?)
 Morphology (prefixes, suffixes, etc.)
 Gebäudereinigungsfirmenangestellter -> Gebäude +
Reinigung + Firma + Angestellter (building + cleaning +
company + employee)
16-Jan-19 Intro to Natural Language Processing 15
 Syntax (part of speech tagging)
 book -> NOUN, VERB
 that -> DETERMINER
 flight -> NOUN
 Book that flight -> VB DT NN
 Ambiguity problem
 Time flies like an arrow -> ?
 Fruit flies like a banana -> ?
 Eats shoots and leaves -> ?
16-Jan-19 Intro to Natural Language Processing 16
 Parsing (grammar)
 I saw a venetian blind
 I saw a blind venetian
 I saw the man on the hill with
a telescope
 Rugby is a game played by men
with odd-shaped balls
 Sentence boundary
detection
 Punctuation denotes the end of
a sentence!
 “But not always!”, said Fred...
16-Jan-19 Intro to Natural Language Processing 17
 ELIZA
 Wiezenbaum 1966
 Simple pattern matching
 PARRY
 Colby et al 1972
 Pattern matching & planning
 SHRDLU
 Winograd 1972
 Natural language understanding
 Comprehensive grammar of English
16-Jan-19 Intro to Natural Language Processing 18
 Differentiate between useful index terms and
‘noise’
 Help lexicographers identify new
terminology:
 the C4 carbons of the nicotinamide rings
 X-ray therapy vs. X-ray therapies
 PAHO vs. Pan-American Health Organization
 Term extraction systems process scientific
papers to identify terminology
16-Jan-19 Intro to Natural Language Processing 19
 Identification of key concepts,
 e.g. people, places, organisations, etc.
 Also postcodes, temporal/numerical expressions, etc.
 “Mexico has been trying to stage a recovery since the
beginning of this year and it's always been getting
ahead of itself in terms of fundamentals,” said Matthew
Hickman of Lehman Brothers in NewYork.”
16-Jan-19 Intro to Natural Language Processing 20
Persons Organisations Cities Countries
Matthew Hickman Lehman Brothers NewYork Mexico
 Entities + relationships
 Based on pre-defined structures, e.g.
 movements of company executives
 corporate mergers / acquisitions
 interactions between genes and proteins
 Example (from MUC-6):
 Now, Mr.James is preparing to sail into the sunset, and Mr. Dooner is poised to
rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st
century.Yesterday, McCann made official what had been widely anticipated: Mr.
James, 57 years old, is stepping down as chief executive officer on July 1 and will
retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45.
16-Jan-19 Intro to Natural Language Processing 21
<SUCCESSION_EVENT-2> :=
SUCCESSION_ORG:
<ORGANIZATION-1>
POST: "chairman"
IN_AND_OUT: <IN_AND_OUT-4>
VACANCY_REASON: DEPART_WORKFORCE
<IN_AND_OUT-4> :=
IO_PERSON: <PERSON-1>
NEW_STATUS: IN
ON_THE_JOB: NO
OTHER_ORG: <ORGANIZATION-1>
REL_OTHER_ORG: SAME_ORG
<ORGANIZATION-1> :=
ORG_NAME: "McCann-Erickson"
ORG_ALIAS: "McCann"
ORG_TYPE: COMPANY
<PERSON-1> :=
PER_NAME: "John J. Dooner Jr."
PER_ALIAS: "John Dooner" "Dooner"
16-Jan-19 Intro to Natural Language Processing 22
 Can now use as metadata (for retrieval)
 Or store in DB and query against it, e.g.
 “show me all the events where a finance officer left
a company to take up the position of CEO in another
company”
16-Jan-19 Intro to Natural Language Processing 23
Dan Jurafsky
Information Extraction
Subject: curriculum meeting
Date: January 15, 2012
To: Dan Jurafsky
Hi Dan, we’ve now scheduled the curriculum meeting.
It will be in Gates 159 tomorrow from 10:00-11:30.
-Chris
24
Create new Calendar entry
Event: Curriculum mtg
Date: Jan-16-2012
Start: 10:00am
End: 11:30am
Where: Gates 159
 Anaphora resolution relies on context
(semantic / pragmatic knowledge):
 We gave the bananas to the monkeys because they were hungry.
 We gave the bananas to the monkeys because they were ripe.
 We gave the bananas to the monkeys because they were here.
16-Jan-19 Intro to Natural Language Processing 25
 Resolve ambiguities due to polysemy
 Often characterised by syntax, e.g.
 Magnesium is a light metal (ADJ)
 The light in the kitchen is quite bright (NOUN)
 Can use a Part of SpeechTagger to resolve
broad ambiguities
 PoS tags = NN, NNP, NNS, NNPS, etc.
16-Jan-19 Intro to Natural Language Processing 26
 How do we measure NLP accuracy?
 Compare against human annotation
 However:
 Humans often disagree
▪ Particularly for complex tasks
▪ IAA places an upper bound on performance
 Comparison can be measured in many ways
▪ “Bill Gates is chairman of Microsoft” -> “Gates” = person?
16-Jan-19 Intro to Natural Language Processing 27
 Human-computer interfaces (chatbots etc.)
 Text classification
 Text summarisation
 Machine translation
 Speech recognition & synthesis
 Natural language generation
 Text mining
 Question answering
 Sentiment Analysis
16-Jan-19 Intro to Natural Language Processing 28
 Media monitoring
 classify incoming news stories
 Search engines:
 classify query intent, e.g. search
Google for ‘LOG313’
 Spam detection
16-Jan-19 Intro to Natural Language Processing 29
Dan Jurafsky
Machine Translation
• Fully automatic
30
• Helping human translators
Enter Source Text:
Translation from Stanford’s Phrasal:
这 不过 是 一 个 时间 的 问题 .
This is only a matter of time.
 Summarization
 single-document vs.
multi-document
 Search results
 MSWord
 Research / analysis
tools
16-Jan-19 Intro to Natural Language Processing 31
 Chatbots
 Smart speakers
 Smartphone assistants
 Call handling systems
 Travel
 Hospitality
 Banking
16-Jan-19 Intro to Natural Language Processing 32
 Analogy with Data Mining
 Discover or infer new knowledge from unstructured
text resources
 A <-> B and B <-> C
 Infer A <-> C?
 e.g. link between migraine headaches and
magnesium deficiency
 Applications in life sciences, media/publishing,
counter terrorism and competitive intelligence
16-Jan-19 Intro to Natural Language Processing 33
 Going beyond the
document retrieval
paradigm
 provide specific answers to
specific questions
 Introduced as aTREC track
in 1999
16-Jan-19 Intro to Natural Language Processing 34
Dan Jurafsky
Question Answering: IBM’s Watson
• Won Jeopardy on February 16, 2011!
35
WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Bram Stoker
 Identify and extract subjective information
 Several sub-tasks:
 Identify polarity, e.g. of movie reviews
▪ e.g. positive, negative, or neutral
 Identify emotional states
▪ e.g. angry, sad, happy, etc.
 Subjectivity/objectivity identification
▪ E.g. “fact” from opinion
 Feature/aspect-based
▪ Differentiate between specific features or aspects of entities
16-Jan-19 Intro to Natural Language Processing 36
Dan Jurafsky
Information Extraction & Sentiment Analysis
• nice and compact to carry!
• since the camera is small and light, I won't need to carry
around those heavy, bulky professional cameras either!
• the camera feels flimsy, is plastic and very light in weight you
have to be very delicate in the handling of this camera
37
Size and weight
Attributes:
zoom
affordability
size and weight
flash
ease of use
✓
✗
✓
 Commodity technologies:
 Tokenisation, normalization, regular expression search
 Mainstream, but room for improvement:
 Lemmatization, spelling correction, speech synthesis
 Mainstream but needs improvement:
 PoS tagging, term extraction, summarization, speech recognition,
text categorization, spoken dialogue systems
 Almost there:
 Information extraction, NL generation, simple speech translation
 Don’t hold your breath:
 Full machine translation, advanced dialogue systems
16-Jan-19 Intro to Natural Language Processing 38
Dan Jurafsky
Language Technology
Coreference resolution
Question answering (QA)
Part-of-speech (POS) tagging
Word sense disambiguation
(WSD)
Paraphrase
Named entity recognition (NER)
Parsing
Summarization
Information extraction (IE)
Machine translation (MT)
Dialog
Sentiment analysis
mostly solved
making good progress
still really hard
Spam detection
Let’s go to Agra!
Buy V1AGRA …
✓
✗
Colorless green ideas sleep furiously.
ADJ ADJ NOUN VERB ADV
Einstein met with UN officials in Princeton
PERSON ORG LOC
You’re invited to our dinner
party, Friday May 27 at 8:30
Party
May 27
add
Best roast chicken in San Francisco!
The waiter ignored us for 20 minutes.
Carter told Mubarak he shouldn’t run again.
I need new batteries for my mouse.
The 13th Shanghai International Film Festival…
第13届上海国际电影节开幕…
The Dow Jones is up
Housing prices rose
Economy is
good
Q. How effective is ibuprofen in reducing
fever in patients with acute febrile illness?
I can see Alcatraz from the window!
XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Where is Citizen Kane playing in SF?
Castro Theatre at 7:30. Do
you want a ticket?
The S&P500 jumped
 Many commercial vendors
 Open source/access:
 GATE (Sheffield University)
 RapidMiner
 Stanford NLP
 OpenNLP
 OpenCalais
 LingPipe
 nltk
16-Jan-19 Intro to Natural Language Processing 40
 Recent additions
 Gensim (& word2vec, GloVe, fastText, etc.)
 Textacy + Spacy
 TextBlob
 BERT, ELMo etc.
16-Jan-19 Intro to Natural Language Processing 41
 D. Jurafsky and J. Martin, Speech and Language Processing.
Prentice-Hall, 2nd edition, 2008.
 Bird, Steven, Ewan Klein, and Edward Loper (2009), Natural
Language Processing with Python, O'Reilly Media.
 Markus Hofmann and Andrew Chisholm. 2016.Text Mining
andVisualization: Case Studies Using Open-SourceTools.
Chapman & Hall/CRC.
 Manning, C. and Schutze, H. Foundations of Statistical
Language Processing. MIT Press, 1999.
16-Jan-19 Intro to Natural Language Processing 42
 Install/configure Python (ideally 3.6)
 Install an IDE (e.g PyCharm)
 Set up a venv etc.
 Install nltk + download data
 See Ch 1 Section 1.2
 Work thru Q19 onwards fromCh 1 of nltk
book:
 https://www.nltk.org/book/ch01.html
16-Jan-19 Intro to Natural Language Processing 43
Tony Russell-Rose, PhD FBCS CITP CEng
Director, UXLabs | Founder, 2dSearch
Visiting Professor, Essex University
 Web: uxlabs.co.uk, 2dsearch.com
 Email: tgr@uxlabs.co.uk
 Blog: http://isquared.wordpress.com
 LinkedIn: http://uk.linkedin.com/in/tonyrussellrose
 Twitter: @tonygrr
44Intro to Natural Language Processing

More Related Content

What's hot

DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationDivya Sugumar
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataitstuff
 
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...Tadahiro Taniguchi
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextLeon Derczynski
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
September 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language ComputingSeptember 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language Computingkevig
 
Powerful landscape of natural language processing
Powerful landscape of natural language processingPowerful landscape of natural language processing
Powerful landscape of natural language processingPolestarsolutions
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for SecurityTao Xie
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 

What's hot (20)

DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Nltk
NltkNltk
Nltk
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-data
 
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
September 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language ComputingSeptember 2021: Top10 Cited Articles in Natural Language Computing
September 2021: Top10 Cited Articles in Natural Language Computing
 
Powerful landscape of natural language processing
Powerful landscape of natural language processingPowerful landscape of natural language processing
Powerful landscape of natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for Security
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 

Similar to Introduction to Natural Language Processing

Timo Honkela: Epistemological status of linguistic theories and models
Timo Honkela: Epistemological status of linguistic theories and modelsTimo Honkela: Epistemological status of linguistic theories and models
Timo Honkela: Epistemological status of linguistic theories and modelsTimo Honkela
 
Ethnography (2).pptx
Ethnography (2).pptxEthnography (2).pptx
Ethnography (2).pptxHumaWaheed4
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowTony Russell-Rose
 
Plain language and artificial intelligence
Plain language and artificial intelligencePlain language and artificial intelligence
Plain language and artificial intelligenceRomina Marazzato Sparano
 
Effective Presentations using Data Visualization
Effective Presentations using Data VisualizationEffective Presentations using Data Visualization
Effective Presentations using Data VisualizationHeather Wilmore Hornbeak
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Use Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorUse Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorLiz Danzico
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Supercharge Your Reading Instruction with iPads
Supercharge Your Reading Instruction with iPadsSupercharge Your Reading Instruction with iPads
Supercharge Your Reading Instruction with iPadsBillieRengo
 
Monkeybars for Young Minds: New Basics for New World Kids
Monkeybars for Young Minds: New Basics for New World KidsMonkeybars for Young Minds: New Basics for New World Kids
Monkeybars for Young Minds: New Basics for New World KidsSusan Marcus
 
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...Matt Bergman
 
Academic Writing: Think before you write
Academic Writing: Think before you writeAcademic Writing: Think before you write
Academic Writing: Think before you writeRon Martinez
 
Ethnography research 1
Ethnography  research 1Ethnography  research 1
Ethnography research 1Joshua Batalla
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"CITE
 

Similar to Introduction to Natural Language Processing (20)

Multiple Intelligence Students Workshop
Multiple Intelligence Students WorkshopMultiple Intelligence Students Workshop
Multiple Intelligence Students Workshop
 
Timo Honkela: Epistemological status of linguistic theories and models
Timo Honkela: Epistemological status of linguistic theories and modelsTimo Honkela: Epistemological status of linguistic theories and models
Timo Honkela: Epistemological status of linguistic theories and models
 
Ethnography (2).pptx
Ethnography (2).pptxEthnography (2).pptx
Ethnography (2).pptx
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
 
Plain language and artificial intelligence
Plain language and artificial intelligencePlain language and artificial intelligence
Plain language and artificial intelligence
 
Effective Presentations using Data Visualization
Effective Presentations using Data VisualizationEffective Presentations using Data Visualization
Effective Presentations using Data Visualization
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
08 09.4.what is-discourse-2
08 09.4.what is-discourse-208 09.4.what is-discourse-2
08 09.4.what is-discourse-2
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Use Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorUse Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence Behavior
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Gloucester
GloucesterGloucester
Gloucester
 
Supercharge Your Reading Instruction with iPads
Supercharge Your Reading Instruction with iPadsSupercharge Your Reading Instruction with iPads
Supercharge Your Reading Instruction with iPads
 
Monkeybars for Young Minds: New Basics for New World Kids
Monkeybars for Young Minds: New Basics for New World KidsMonkeybars for Young Minds: New Basics for New World Kids
Monkeybars for Young Minds: New Basics for New World Kids
 
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...
No technology? No Problem! Creative Ways of Using UDL without Technology (Jul...
 
Academic Writing: Think before you write
Academic Writing: Think before you writeAcademic Writing: Think before you write
Academic Writing: Think before you write
 
Ethnography research 1
Ethnography  research 1Ethnography  research 1
Ethnography research 1
 
LANGUAGE &THOUGHT -2.ppt
LANGUAGE &THOUGHT -2.pptLANGUAGE &THOUGHT -2.ppt
LANGUAGE &THOUGHT -2.ppt
 
Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"Chi-Un Lei "Text Mining and Educational Discourse"
Chi-Un Lei "Text Mining and Educational Discourse"
 

More from Tony Russell-Rose

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrievalTony Russell-Rose
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional SearchTony Russell-Rose
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasetsTony Russell-Rose
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestionsTony Russell-Rose
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationTony Russell-Rose
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryTony Russell-Rose
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategiesTony Russell-Rose
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcareTony Russell-Rose
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourTony Russell-Rose
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsTony Russell-Rose
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsTony Russell-Rose
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryTony Russell-Rose
 

More from Tony Russell-Rose (15)

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrieval
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional Search
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasets
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestions
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulation
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of Discovery
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategies
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcare
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search Behaviour
 
A Taxonomy of Site Search
A Taxonomy of Site SearchA Taxonomy of Site Search
A Taxonomy of Site Search
 
Patterns of Personalization
Patterns of PersonalizationPatterns of Personalization
Patterns of Personalization
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutions
 
From Search to Discovery
From Search to DiscoveryFrom Search to Discovery
From Search to Discovery
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information Discovery
 

Recently uploaded

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Introduction to Natural Language Processing

  • 2. Tony Russell-Rose, PhD Director, UXLabs  Led applied research & technology innovation at Microsoft, Canon, Reuters, BT Labs, HP Labs, Endeca  Visiting Professor of Cognitive Computing & AI, Essex University. Founder of Search Solutions conference series  Founder, 2dSearch: next generation advanced search  PhD (natural language interfaces), MSc in human-computer interaction, first degree in engineering (human factors)  5 patents for search user interface design. 80+ scientific and technical papers on AI/NLP, information retrieval and user experience  Honorary Visiting Fellow at City University London (Centre for Interactive Systems Research)  Blog: http://isquared.wordpress.com Industry Expertise  Media & Publishing  Healthcare  Business Intelligence  Financial Services  eCommerce Domain Expertise  UX research  Information architecture  Search & information retrieval  AI + machine learning 2Intro to Natural Language Processing
  • 3.  Course outline  Background & terminology  Fundamentals  Techniques & applications  Further reading 16-Jan-19 Intro to Natural Language Processing 3
  • 4. 16-Jan-19 Intro to Natural Language Processing 4 Week Lecture Practical Topic 1 14-Jan-2019 16-Jan-2019 Intro 2 21-Jan-2019 23-Jan-2019 Basic text processing 3 28-Jan-2019 30-Jan-2019 Language modelling 4 4-Feb-2019 6-Feb-2019 Text classification 5 11-Feb-2019 13-Feb-2019 Sentiment analysis 6 18-Feb-2019 20-Feb-2019 POS tagging 7 25-Feb-2019 27-Feb-2019 Parsing 8 4-Mar-2019 6-Mar-2019 Information Extraction 11-Mar-2019 13-Mar-2019 9 18-Mar-2019 20-Mar-2019 Information retrieval 10 25-Mar-2019 27-Mar-2019 Lexical semantics 11 1-Apr-2019 3-Apr-2019 Word embeddings 12 8-Apr-2019 10-Apr-2019 Question answering 13 15-Apr-2019 17-Apr-2019 Summarization 22-Apr-2019 24-Apr-2019 14 29-Apr-2019 EXAMS Revision
  • 5.  As humans we do it effortlessly ... don’t we?  DRUNK GETS NINEYEARS INVIOLIN CASE  PROSTITUTESAPPEALTO POPE  STOLEN PAINTING FOUND BYTREE  REDTAPE HOLDS UP NEW BRIDGE  DEER KILL 300,000  RESIDENTS CAN DROP OFFTREES  INCLUDE CHILDRENWHEN BAKING COOKIES  MINERS REFUSETOWORK AFTER DEATH 16-Jan-19 Intro to Natural Language Processing 5
  • 6.  Polysemy  One word maps to many concepts  e.g. bat  Synonymy  One concept maps to many words 16-Jan-19 Intro to Natural Language Processing 6
  • 7.  Word order  Venetian blind vs. Blind venetian 16-Jan-19 Intro to Natural Language Processing 7
  • 8.  Language is generative  Starbucks coffee is the best  The place I like most when I need to feed my caffeine addiction is the company from Seattle with branches everywhere  Many different ways to express a given idea  Synonymy, paraphrase, metaphor, etc. 16-Jan-19 Intro to Natural Language Processing 8
  • 9.  Language is changing  I want to buy a mobile  Ill-formed input  “accomodation office”  Co-ordination, negation, etc.  This is not a talk about neuro-linguistic programming  Multi-linguality  Claudia Schiffer is on the cover of Elle  Sarcasm, irony, slang, jargon, etc.  That was a wicked lecture  Yep – the coffee break was the best part 16-Jan-19 Intro to Natural Language Processing 9
  • 10. Dan Jurafsky non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues idioms dark horse get cold feet lose face throw in the towel neologisms unfriend Retweet bromance tricky entity names Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … world knowledge Mary and Sue are sisters. Mary and Sue are mothers. But that’s what makes it fun! the New York-New Haven Railroad the New York-New Haven Railroad Why else is natural language understanding difficult?
  • 11.  Text Analytics: a set of linguistic, analytical and predictive techniques to extract structure and meaning from unstructured documents  NLP: language can be spoken as well as written  Text Analytics ~= Natural Language Processing ~=Text Mining 16-Jan-19 Intro to Natural Language Processing 11
  • 12.  Computational Linguistics  Use of computational techniques to study linguistic phenomena  Cognitive science  Study of human information processing (perception, language, reasoning, etc.)  Computer science  Theoretical foundations of computation and practical techniques for implementation  Information science  Analysis, classification, manipulation, retrieval and dissemination of information 16-Jan-19 Intro to Natural Language Processing 12
  • 13.  Symbolic approaches  Rule-based, hand coded (by linguists/SMEs)  Knowledge-intensive  Statistical approaches  Distributional & neural approaches, supervised or unsupervised  Data-intensive 16-Jan-19 Intro to Natural Language Processing 13
  • 14.  Language is AMBIGUOUS  To determine structure, we must resolve ambiguity!  Lexical analysis (tokenisation)  The cat sat on the mat  I can’t tokenise this sentence  Stop word removal  No definitive list  The Who, The The, Take That…  To be or not to be 16-Jan-19 Intro to Natural Language Processing 14
  • 15.  Stemming  fishing, fished, fish, fisher -> fish  Lemmatization  Linguistically principled analysis  Passing -> pass + ING  Were -> be + PAST  Delegate = de-leg-ate (?)  Ratify = rat-ify (?)  Morphology (prefixes, suffixes, etc.)  Gebäudereinigungsfirmenangestellter -> Gebäude + Reinigung + Firma + Angestellter (building + cleaning + company + employee) 16-Jan-19 Intro to Natural Language Processing 15
  • 16.  Syntax (part of speech tagging)  book -> NOUN, VERB  that -> DETERMINER  flight -> NOUN  Book that flight -> VB DT NN  Ambiguity problem  Time flies like an arrow -> ?  Fruit flies like a banana -> ?  Eats shoots and leaves -> ? 16-Jan-19 Intro to Natural Language Processing 16
  • 17.  Parsing (grammar)  I saw a venetian blind  I saw a blind venetian  I saw the man on the hill with a telescope  Rugby is a game played by men with odd-shaped balls  Sentence boundary detection  Punctuation denotes the end of a sentence!  “But not always!”, said Fred... 16-Jan-19 Intro to Natural Language Processing 17
  • 18.  ELIZA  Wiezenbaum 1966  Simple pattern matching  PARRY  Colby et al 1972  Pattern matching & planning  SHRDLU  Winograd 1972  Natural language understanding  Comprehensive grammar of English 16-Jan-19 Intro to Natural Language Processing 18
  • 19.  Differentiate between useful index terms and ‘noise’  Help lexicographers identify new terminology:  the C4 carbons of the nicotinamide rings  X-ray therapy vs. X-ray therapies  PAHO vs. Pan-American Health Organization  Term extraction systems process scientific papers to identify terminology 16-Jan-19 Intro to Natural Language Processing 19
  • 20.  Identification of key concepts,  e.g. people, places, organisations, etc.  Also postcodes, temporal/numerical expressions, etc.  “Mexico has been trying to stage a recovery since the beginning of this year and it's always been getting ahead of itself in terms of fundamentals,” said Matthew Hickman of Lehman Brothers in NewYork.” 16-Jan-19 Intro to Natural Language Processing 20 Persons Organisations Cities Countries Matthew Hickman Lehman Brothers NewYork Mexico
  • 21.  Entities + relationships  Based on pre-defined structures, e.g.  movements of company executives  corporate mergers / acquisitions  interactions between genes and proteins  Example (from MUC-6):  Now, Mr.James is preparing to sail into the sunset, and Mr. Dooner is poised to rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st century.Yesterday, McCann made official what had been widely anticipated: Mr. James, 57 years old, is stepping down as chief executive officer on July 1 and will retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45. 16-Jan-19 Intro to Natural Language Processing 21
  • 22. <SUCCESSION_EVENT-2> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "chairman" IN_AND_OUT: <IN_AND_OUT-4> VACANCY_REASON: DEPART_WORKFORCE <IN_AND_OUT-4> := IO_PERSON: <PERSON-1> NEW_STATUS: IN ON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-1> REL_OTHER_ORG: SAME_ORG <ORGANIZATION-1> := ORG_NAME: "McCann-Erickson" ORG_ALIAS: "McCann" ORG_TYPE: COMPANY <PERSON-1> := PER_NAME: "John J. Dooner Jr." PER_ALIAS: "John Dooner" "Dooner" 16-Jan-19 Intro to Natural Language Processing 22
  • 23.  Can now use as metadata (for retrieval)  Or store in DB and query against it, e.g.  “show me all the events where a finance officer left a company to take up the position of CEO in another company” 16-Jan-19 Intro to Natural Language Processing 23
  • 24. Dan Jurafsky Information Extraction Subject: curriculum meeting Date: January 15, 2012 To: Dan Jurafsky Hi Dan, we’ve now scheduled the curriculum meeting. It will be in Gates 159 tomorrow from 10:00-11:30. -Chris 24 Create new Calendar entry Event: Curriculum mtg Date: Jan-16-2012 Start: 10:00am End: 11:30am Where: Gates 159
  • 25.  Anaphora resolution relies on context (semantic / pragmatic knowledge):  We gave the bananas to the monkeys because they were hungry.  We gave the bananas to the monkeys because they were ripe.  We gave the bananas to the monkeys because they were here. 16-Jan-19 Intro to Natural Language Processing 25
  • 26.  Resolve ambiguities due to polysemy  Often characterised by syntax, e.g.  Magnesium is a light metal (ADJ)  The light in the kitchen is quite bright (NOUN)  Can use a Part of SpeechTagger to resolve broad ambiguities  PoS tags = NN, NNP, NNS, NNPS, etc. 16-Jan-19 Intro to Natural Language Processing 26
  • 27.  How do we measure NLP accuracy?  Compare against human annotation  However:  Humans often disagree ▪ Particularly for complex tasks ▪ IAA places an upper bound on performance  Comparison can be measured in many ways ▪ “Bill Gates is chairman of Microsoft” -> “Gates” = person? 16-Jan-19 Intro to Natural Language Processing 27
  • 28.  Human-computer interfaces (chatbots etc.)  Text classification  Text summarisation  Machine translation  Speech recognition & synthesis  Natural language generation  Text mining  Question answering  Sentiment Analysis 16-Jan-19 Intro to Natural Language Processing 28
  • 29.  Media monitoring  classify incoming news stories  Search engines:  classify query intent, e.g. search Google for ‘LOG313’  Spam detection 16-Jan-19 Intro to Natural Language Processing 29
  • 30. Dan Jurafsky Machine Translation • Fully automatic 30 • Helping human translators Enter Source Text: Translation from Stanford’s Phrasal: 这 不过 是 一 个 时间 的 问题 . This is only a matter of time.
  • 31.  Summarization  single-document vs. multi-document  Search results  MSWord  Research / analysis tools 16-Jan-19 Intro to Natural Language Processing 31
  • 32.  Chatbots  Smart speakers  Smartphone assistants  Call handling systems  Travel  Hospitality  Banking 16-Jan-19 Intro to Natural Language Processing 32
  • 33.  Analogy with Data Mining  Discover or infer new knowledge from unstructured text resources  A <-> B and B <-> C  Infer A <-> C?  e.g. link between migraine headaches and magnesium deficiency  Applications in life sciences, media/publishing, counter terrorism and competitive intelligence 16-Jan-19 Intro to Natural Language Processing 33
  • 34.  Going beyond the document retrieval paradigm  provide specific answers to specific questions  Introduced as aTREC track in 1999 16-Jan-19 Intro to Natural Language Processing 34
  • 35. Dan Jurafsky Question Answering: IBM’s Watson • Won Jeopardy on February 16, 2011! 35 WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL Bram Stoker
  • 36.  Identify and extract subjective information  Several sub-tasks:  Identify polarity, e.g. of movie reviews ▪ e.g. positive, negative, or neutral  Identify emotional states ▪ e.g. angry, sad, happy, etc.  Subjectivity/objectivity identification ▪ E.g. “fact” from opinion  Feature/aspect-based ▪ Differentiate between specific features or aspects of entities 16-Jan-19 Intro to Natural Language Processing 36
  • 37. Dan Jurafsky Information Extraction & Sentiment Analysis • nice and compact to carry! • since the camera is small and light, I won't need to carry around those heavy, bulky professional cameras either! • the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera 37 Size and weight Attributes: zoom affordability size and weight flash ease of use ✓ ✗ ✓
  • 38.  Commodity technologies:  Tokenisation, normalization, regular expression search  Mainstream, but room for improvement:  Lemmatization, spelling correction, speech synthesis  Mainstream but needs improvement:  PoS tagging, term extraction, summarization, speech recognition, text categorization, spoken dialogue systems  Almost there:  Information extraction, NL generation, simple speech translation  Don’t hold your breath:  Full machine translation, advanced dialogue systems 16-Jan-19 Intro to Natural Language Processing 38
  • 39. Dan Jurafsky Language Technology Coreference resolution Question answering (QA) Part-of-speech (POS) tagging Word sense disambiguation (WSD) Paraphrase Named entity recognition (NER) Parsing Summarization Information extraction (IE) Machine translation (MT) Dialog Sentiment analysis mostly solved making good progress still really hard Spam detection Let’s go to Agra! Buy V1AGRA … ✓ ✗ Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV Einstein met with UN officials in Princeton PERSON ORG LOC You’re invited to our dinner party, Friday May 27 at 8:30 Party May 27 add Best roast chicken in San Francisco! The waiter ignored us for 20 minutes. Carter told Mubarak he shouldn’t run again. I need new batteries for my mouse. The 13th Shanghai International Film Festival… 第13届上海国际电影节开幕… The Dow Jones is up Housing prices rose Economy is good Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? I can see Alcatraz from the window! XYZ acquired ABC yesterday ABC has been taken over by XYZ Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket? The S&P500 jumped
  • 40.  Many commercial vendors  Open source/access:  GATE (Sheffield University)  RapidMiner  Stanford NLP  OpenNLP  OpenCalais  LingPipe  nltk 16-Jan-19 Intro to Natural Language Processing 40
  • 41.  Recent additions  Gensim (& word2vec, GloVe, fastText, etc.)  Textacy + Spacy  TextBlob  BERT, ELMo etc. 16-Jan-19 Intro to Natural Language Processing 41
  • 42.  D. Jurafsky and J. Martin, Speech and Language Processing. Prentice-Hall, 2nd edition, 2008.  Bird, Steven, Ewan Klein, and Edward Loper (2009), Natural Language Processing with Python, O'Reilly Media.  Markus Hofmann and Andrew Chisholm. 2016.Text Mining andVisualization: Case Studies Using Open-SourceTools. Chapman & Hall/CRC.  Manning, C. and Schutze, H. Foundations of Statistical Language Processing. MIT Press, 1999. 16-Jan-19 Intro to Natural Language Processing 42
  • 43.  Install/configure Python (ideally 3.6)  Install an IDE (e.g PyCharm)  Set up a venv etc.  Install nltk + download data  See Ch 1 Section 1.2  Work thru Q19 onwards fromCh 1 of nltk book:  https://www.nltk.org/book/ch01.html 16-Jan-19 Intro to Natural Language Processing 43
  • 44. Tony Russell-Rose, PhD FBCS CITP CEng Director, UXLabs | Founder, 2dSearch Visiting Professor, Essex University  Web: uxlabs.co.uk, 2dsearch.com  Email: tgr@uxlabs.co.uk  Blog: http://isquared.wordpress.com  LinkedIn: http://uk.linkedin.com/in/tonyrussellrose  Twitter: @tonygrr 44Intro to Natural Language Processing