This document provides an introduction to computational linguistics. It discusses what linguistics is, including descriptive vs prescriptive grammar and linguistic methods. It then covers areas of linguistic theory like phonetics, phonology, morphology, syntax, semantics and pragmatics. It defines computational linguistics as applying these linguistic principles to problems in computer science, with goals like language understanding, learning and translation. It provides examples of computational linguistics tasks and discusses why the field can seem ad hoc.
1. Computational Linguistics
Lecture 1: Introduction
What is Linguistics?
Prescriptive Grammar:
Descriptive Grammar:
Some Linguistic Methods
Grammatical Theory
Linguistics Beyond Grammar
linguistics
What is computational linguistics?
Dr. Saeed Rahati
2. What is Linguistics?
Linguistics is the study of human language,
broadly construed.
Linguistics is a scientific discipline with
established theories, analytic methods, and
real-world applications.
Linguists often study individual languages,
but...
When linguists study individual languages,
they have larger issues in mind.
3. … What is Linguistics? 3
Linguistics is descriptive,
not prescriptive
4. Prescriptive Grammar: 4
Rules against certain usages.
Few if any rules for what is
allowed.
Condemns forms generally in use.
Explicitly normative enterprise.
5. Descriptive Grammar:
Rules characterizing what people do
say.
Tries to do so in a way that reflects
internalized generalizations that
people have made.
Linguists are fundamentally
concerned with linguistic
knowledge.
6. Anyway, language isn’t logical: 6
• parkway vs. driveway
• maternity dress vs. paternity suit
• bathing trunks (pl) vs. bikini (sing)
• you are vs. *you is
• Aren’t I clever? vs. *I aren’t clever.
7. Some Linguistic Methods 7
◮ Fieldwork
◮ Formal analysis of patterns in data
sets
◮ Psycholinguistic experiments
◮ Computational modeling
◮ Corpus analysis
8. Grammatical Theory 8
◮ Phonetics: The study of speech sounds
◮ Phonology: The study of sound systems
◮ Morphology: The study of word structure
◮ Syntax: The study of sentence structure
◮ Semantics: The study of linguistic meaning
◮ Pragmatics: The study of language use
11. Morphology: The Study Of Word
Structure 11
◮ missile: ‘ICBM’
◮ anti-tank-missile: ‘missile targetting
tanks’
◮ anti-aircraft-missile: ‘missile targetting
aircraft’
◮ anti-missile-missile: ‘missile targetting
ICBMs’
12. Morphological Rules 12
◮ Rule: Anti-X-missile is a missile
targetting Xs.
◮ What kind of missile targets anti-
missile-missiles?
◮ anti-anti-missile-missile-missile
◮ anti-anti-anti-missile-missile-missile-missile:
‘missile targetting anti-anti-missile-missile-missiles’
13. Syntax: The Study of Sentence
Structure 13
◮ I saw the woman with the telescope.
I [saw [the woman] [with the telescope]].
I [saw [[the woman] [with the telescope]]].
◮ Put the block in the box on the table
in the bedroom.
◮ Put the block in the box on the table
in the bedroom near the kitchen.
14. Semantics: The Study of
Linguistic Meaning 14
◮ Structural Ambiguity produces semantic
ambiguity.
◮ Both in morphology and syntax.
◮ Lexical Ambiguity: We screened the
candidates.
◮ Both Together: I saw her duck.
15. Pragmatics: The Study of
Language Use 15
Q: Is Palin a Republican?
A: Is the Pope Catholic?
◮ Why don’t you move up to the City?
◮ Why should I stand here and listen
to this?
◮ Do you think I’m saying this just to
hear the sound of my own voice?
16. Linguistics Beyond Grammar 16
◮ Historical Linguistics: How languages change over
time.
◮ Sociolinguistics: How languages vary socially. How
language is used as a social resource.
◮ Psycholinguistics: What goes on in people’s heads
as they use language.
◮ Language Acquisition: How people learn language.
(first language acquisition; second language
acquisition)
◮ Computational Linguistics: Making computers
process (generate/‘understand’/translate...) human
languages.
17. Computational Linguistics 17
computational linguistics
linguistics?
chemistry
bioloneuropsychologypsychology literary
physics
gy criticism
more rigorous
flakey
more
less rigorous
18. What defines the rigor of a
field? 18
Whether results are reproducible
Whether theories are testable/falsifiable
Whether there are a common set of
methods for similar problems
Whether approaches to problems can yield
interesting new questions/answers
20. Linguistics 20
engineering linguistics sociology literary
criticism
more rigorous
less rigorous
Computational
21. other areas of less
sociolinguistics rigorous
(e.g. Deborah Tannen)
(
“theoretical”
linguistics
(e.g. minimalist
s
syntax)
The true situation with
“theoretical” linguistics
(e.g. lexical-functional
g
grammar)
historical linguistics
some areas of
linguistics
sociolinguistics
(
(e.g. Bill Labov)
psycholinguistics
experimental phonetics
more
rigorous
22. What is computational
linguistics? 22
Text normalization/segmentation
Morphological analysis
Automatic word pronunciation prediction
Transliteration
Word-class prediction: e.g. part of speech tagging
Parsing
Semantic role labeling
Machine translation
Dialog systems
Topic detection
Summarization
Text retrieval
Bioinformatics
Language modeling for automatic speech recognition
C
Computer-aided language learning (CALL)
23. Computational linguistics 23
Often thought of as natural language
engineering
But there is also a serious scientific
component to it.
24. Goals of Computational Linguistics/
Natural Language Processing 24
To get computers to deal with language the
way humans do:
They should be able to understand language
and respond appropriately in language
They should be able to learn human language
the way children do
They should be able to perform linguistic tasks
that skilled humans can do, such as
translation
Yeah, right
25. Some interesting themes… 25
Finite-state methods:
Many application areas
Raises many interesting questions about how
“regular” language is
Grammar induction:
Linguists have done a poor job at their stated goal of
explaining how humans learn grammar
Computational models of language change:
Historical evidence for language change is only
partial. There are many changes in language for which
we have no direct evidence.
26. Why CL may seem ad hoc 26
Wide variety of areas (as in linguistics)
If it’s natural language engineering the
goal is often just to build something that
works
Techniques tend to change in somewhat
faddish ways…
For example: machine learning approaches
fall in and out of favor