3. LOGO
USER : Men are all alike.
ELIZA : In what way?
USER : They’re always bugging us about something or other.
ELIZA : Can you think of a specific example?
USER : Well, my boyfriend made me come here.
ELIZA : Your boyfriend made you come here ?
USER : He says I’m depressed much of the time.
ELIZA : I am sorry to hear you are depressed.
USER : It’s true, I am unhappy.
ELIZA : Do you think coming here will help you not to be
unhappy?
USER : I need some help; that much seems certain.
ELIZA : What would it mean to you if you got some help?
USER : Perhaps I could learn to get along with my mother.
ELIZA : Tell me more about your family.
USER : My mother takes care of me.
ELIZA : Who else in your family takes care of you?
USER : My father.
ELIZA : Your father ?
USER : You are like my father in some ways.
ELIZA : What resemblance do you see?
4. A sub-field of Artificial Intelligent, since 1960s …
Concerned with the interactions between computers and
human languages with one ultimate goal : Computers can
“understand” human
Many applications in real world
5. Natural language unit?
Natural language understanding
Natural language generation
Data?
Speech processing
Text processing
Natural language text understanding!
6. Task of generating natural language from a machine
representation
May be viewed as the opposite of natural language
understanding .
Applications:
Jokes generation
Textual summaries of databases
Enhancing accessibility
7. An advanced subtopic of NLP deals with reading
comprehension
More complex than NLG
Many commercial interest in this field
News-gathering
Data-Mining
Voice-Activation
Large-scale content analysis
8. Logic is too clear, the lost of flexibility cause
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
Someone else said it, but I didn't.
9. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I simply didn't ever say it
10. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I might have implied it in some way, but I never explicitly said it
11. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I said someone took it; I didn't say it was she
12. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples:
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I just said she probably borrowed it
13. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I said she stole someone else's money
14. Logic is too clear, the lost of flexibility become
difficulties in NLP
Examples :
Time flies like an arrow
Can be understood in 7 ways !!!
I never said she stole my money !
I said she stole something, but not my money
15. Words combination and division
Stress placing on words
The properties of subjects
We gave the monkeys the bananas because they were
hungry
We gave the monkeys the bananas because they were
over-ripe
Specifying which word an adjective applies to
A pretty little girls' school
16. Involves reasoning about the world
Embedded a social system of people interacting
persuading, insulting and amusing them
changing over time
Homonymous
28. ePi Group:
Automatic Vietnamese processing system
www.baomoi.com
Collecting news from all Vietnamese e-newspapers
EVTrans – Softex Co Ltd.
Cyclop
VnKim
29.
30.
31.
32.
33. Morphological analysis :
Individual words are analyzed into their
components
Syntactic analysis
Linear sequence of words are transformed
into structures that show how the words
relate to each other
Semantic analysis
A transformation is made from the input
text to an internal representation that
reflects the meaning
Pragmatic analysis
To reinterpret what was said to what was
actually meant
Discourse analysis
Resolving references between sentences
36. Morphemes: smallest meaningful unit
spoken units of language.
Stem: book, cat, car, …
Affixes : un-, -s, -es, .. Morphology
Clitic: ‘ve, ‘m Syntax
Semantic
Morphological parsing: parsing a word
Pragmatic
into stem and affixes and identifying the
Discourse
parts and their relationships
37. Word Classes
Parts of speech: noun, verb, adjectives,
etc.
Morphology
Word class dictates how a word combines
with morphemes to form new words Syntax
Semantic
Examples Pragmatic
Books: book + s
Discourse
Unladylike = un + lady + like
38. Vietnamese?
Ăn = ăn
Morphology
Uống = uống
Xe = xe Syntax
Semantic
No ‘Xes’ in Vietnamese! Pragmatic
Problems are text tokenizing. Discourse
39. Why parse words?
Morphology
To identify a word’s part-of-speech
To identify a word’s stem (IR) Syntax
Semantic
… then? Pragmatic
Spell- checking
Discourse
To predict next words
To predict the word’s accent
40. Ambiguity
I want her to go to the cinema with me
Morphology
To - infinitive? Syntax
To - preposition? Semantic
Pragmatic
Con ngựa đá đá con ngựa đá.
Discourse
đá = đá?
41. How to implement?
Regular expression
Finite State Transducers (FST)
Finite State Accepter (FSA) Morphology
Syntax
*.exe Semantic
ir??man
Pragmatic
b[0-9]+ *(Mb|[Mm]egabytes?)b
Discourse
42.
43. Relate terms:
Stem, stemming Morphology
Part of speech
Syntax
N-gram
Semantic
Pragmatic
Discourse
46. Linear sequence of words are transformed into
structures that show how the words relate to
each other.
Morphology
Determine grammatical structure.
Syntax
Semantic
Pragmatic
I am a boy = [Subject] [Verb] [Cardinal] [Noun] Discourse
48. Syntax
Actual structure of a sentence
Morphology
Syntax
Grammar
Semantic
The rule set used in the analysis
Pragmatic
Discourse
49. A grammar define syntactically legal sentences
I ate an apple (syntactic legal)
I ate apple (not syntactic legal)
I ate a building (syntactic legal, but?) Morphology
Syntax
doesn’t mean that it’s meaningful! Semantic
Pragmatic
Discourse
53. What could this mean…
Representations of linguistic inputs that capture
the meanings of those inputs
For us it means Morphology
Representations that permit or facilitate Syntax
semantic processing
Permit us to reason about their truth Semantic
(relationship to some world)
Pragmatic
Permit us to answer questions based on their
content Discourse
Permit us to perform inference (answer
questions and determine the truth of things we
don’t actually know)
57. Pragmatics: concerns how sentences are
used in different situations and how use
Morphology
affects the interpretation of the sentence
Syntax
Semantic
Discourse: concerns how the Pragmatic
immediately preceding sentences affect
Discourse
the interpretation of the next sentence
58. Morphology
Syntax
‘He’, ‘it’, ‘his’ can be inferred from
Semantic
previous sentence
Pragmatic
It’s discourse Discourse
69. Can we use previously translated text to learn how to
translate new texts?
Yes! But, it’s not so easy
Two paradigms, statistical MT, and EBMT
Requirements:
Aligned large parallel corpus of translated sentences
{S source S target }
Bilingual dictionary for intra-S alignment
Generalization patterns (names, numbers, dates…)
70. Simplest: Translation Memory
If S new= S source in corpus, output aligned S target
Compositional EBMT
If fragment of Snew matches fragment of Ss, output
corresponding fragment of aligned St
Prefer maximal-length fragments
Maximize grammatical compositionality
Via a target language grammar
Or, via an N-gram statistical language model
71. Requires an Interlingua - language-neutral Knowledge
Representation (KR)
Philosophical debate: Is there an interlingua?
FOL is not totally language neutral (predicates,
functions, expressed in a language)
Other near-interlinguas (Conceptual Dependency)
Requires a fully-disambiguating parser
Domain model of legal objects, actions, relations
Requires a NL generator (KR -> text)
Applicable only to well-defined technical domains
Produces high-quality MT in those domains
73. Each approach has its own strength
Rapidly adaptable: statistical, example-based
Good grammar: rule-based (grammar)
High precision in narrow domain: Intelingua
75. Spider - a browser-like program that downloads web pages.
Crawler – a program that automatically follows all of the
links on each web page.
Indexer - a program that analyzes web pages downloaded
by the spider and the crawler.
Database– storage for downloaded and processed pages.
Results engine – extracts search results from the database.
Web server – a server that is responsible for interaction
between the user and other search engine components.
76. Spider - a browser-like program that downloads web pages.
Crawler – a program that automatically follows all of the
links on each web page.
Indexer - a program that analyzes web pages downloaded
by the spider and the crawler.
Database– storage for downloaded and processed pages.
Results engine – extracts search results from the database.
Web server – a server that is responsible for interaction
between the user and other search engine components.
77.
78.
79.
80. Idea is to ‘extract’ particular types of information from
arbitrary text or transcribed speech
Examples:
Names entities: people, places, organization
Telephone numbers
Dates
Many uses:
Question answering systems, fisting of news or mail…
Job ads, financial information, terrorist attacks
81. Often use a set of simple templates or frames with slots
to be filled in from input text. Ignore everything else.
Husni’s number is 966-3-860-2624.
The inventor of the First plane was Abbas ibnu Fernas
The British King died in March of 1932.
82. Named Entity recognition (NE)
Finds and classifies names, places etc.
Co-reference Resolution (CO)
Identifies identity relations between entities in texts.
Template Element construction (TE)
Adds descriptive information to NE results (using CO).
Template Relation construction (TR)
Finds relations between TE entities. Scenario
Template production (ST)
Fits TE and TR results into specified event scenarios.
83.
84.
85.
86.
87.
88.
89. AIML = Artificial Intelligent Mark-up Language
Alice
90. A.L.I.C.E. (Artificial Linguistic Internet Computer
Entity)
an award-winning free natural language artificial
intelligence chat robot.
Ruled-base
Human-like answer without complicated “brain”
Multi-language
91.
92. NLP’s course , Husni Al-Muhtaseb
Lexical descriptions for Vietnamese language
processing .
en.wikipedia.org
www.xulyngonngu.com