SlideShare a Scribd company logo
1 of 44
Download to read offline
1
LANGUAGE
Technology
日
本
語動互機人
1
Human Language Technology
2
Natural Language Processing
Computational Linguistics
3
NLP
• Computation
• Linguistics
4
Information Retrieval
Search Engine
5
IR
• Vector Space Model (tf-idf)
• Latent Semantic Analysis
• Link Analysis
6
Human-Computer Interaction
Applied Psychology
7
HCI
• Effectiveness
• Efficiency
• Satisfaction
8
What are doable tasks?
http://en.wikipedia.org/wiki/
Natural_language_processing#Major_tasks_in_NLP
9
Everything is labeling
10
e.g. MeCab
http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html
11
–– George E. P. Box
“Essentially, all models are wrong, but some are useful.”
12
What’s the niche?
Know where’s the limit.
13
快、狠、準。
The Slap of a Thousand Exploding Suns.
(http://en.wikipedia.org/wiki/Slapsgiving_3:_Slappointment_in_Slapmarra)
(http://tune.pk/video/1866501/how-i-met-your-mother-slapsgiving-3-slappointment-in-slapmara-preview)
14
What can WE do?
• Be creative on combination, e.g.
• MT → Summarization
• Deception detection
15
16
<(_ _)>
https://www.coursera.org/course/nlangp
/ 51
Fundamental Unit?
a meta-communication
18
/ 51
What is a Word?
to linguistics
19
/ 51
“... the smallest free form that may
be uttered in isolation with semantic
or pragmatic content (with literal or
practical meaning) ...”
http://en.wikipedia.org/wiki/Word
20
/ 51
“... the task of defining what
constitutes a ‘word’ involves
determining where one word ends
and another word begins...”
http://en.wikipedia.org/wiki/Word#Word_boundaries
21
/ 51
Word Boundary?
• Orthographic
• Sociological
• Lexical
• Semantic
• Phonological
• Morphological
• Syntactic
• Psycholinguistic
22
/ 51
Orthographic Word
• Writing convention
• Space
• How about Ancient Greek?
• OED: africanization vs. americanization
23
/ 51
Sociological Word
• Between a phoneme and a sentence
• 字 (zi) vs. 詞 (ci)
24
/ 51
Lexical Word
• Listedness: cannot be generated “on-line”
• Dictionary entry?
• Orthographic?
• Idiomatic phrase?
• “Kick the bucket”
25
/ 51
Semantic Word
• Difficult to define
• Without phonological form?
• Closer to morpheme?
• “Bio-”
26
/ 51
Phonological Word
• Prosodic word?
• Disyllabic Chinese words
• Hyphenation? Syllabification?
27
/ 51
Morphological Word
• Bound root:“bio-”
• [[貓頭]鷹]: is [貓頭] a word here?
• [[台北]市] vs.Taipei city
28
/ 51
Syntactic Word
• Minimally occupy the category slots
• Hence [鴨] in [[鴨]⼦子] is NOT a word
• Just like “duck” in “duck-ie”
29
/ 51
Psycholinguistic Word
• All of the above?
30
/ 51
What is a Word?
to computational linguistics
31
/ 51
Standard de jure?
• Academia Sinica Balanced Corpus
• Chinese Treebank of University of
Pennsylvania
• City University of Hong Kong
• Microsoft Research Asia
• Peking University
32
Comparison
CTB China ASBC
ABAB ABAB AB-AB ABAB 研究研究
AA看 [AA/V-看/V]/V AA看 AA看 説説看
Person Name One Two One
Noun-們 One Two Two 朋友們
Ordinals One Two Two 第⼀一
/ 51
... then match standards
the more accuracy, the better communication?
34
Partial Match?
Or dictionary, concordance, collocation, etc?
cross-lingual
information retrieval
/ 51
Evaluation Examples
• Gold standard
• [[meta][data]] / is / the / data / of / data
• 5 boundaries, 7 morphemes, 6 words, 5
lexicon types
• Test subject
• meta / data / is / the / data / of / data
• 1 boundary error, 0 morpheme error, two
word errors, 1 lexicon type error.
38
/ 51
Term Type
• Kwok (2002)
• Insensitive: stop-words; frequent non-content-bearing
• Monotonic: content-bearing
• Non-monotonic:
• ⻄西⼟土⽿耳其 (Western Turkey)
• Semantic, syntax, or surface?
• 农 (agricultural) / 作物 (plants)
• 旱 (drought) / 灾 (disaster) vs. 春旱 (Spring drought) vs. 旱区 (area or
drought disaster)
• Recall or precision?
• ⽕火 (fire) / ⼭山 (mountain) vs. ⽕火⼭山 (volcano)
39
/ 51
Surface Pattern
• Ambiguity
• Combinatorial
• ⻄西⼟土⽿耳其、农作物、旱灾、春旱、旱区、
⽕火⼭山... etc.
• Overlapping
• 施政 (practice policy) / 伟 (great) vs. 施
(Shih) / 政伟 (Zheng-Wei)
• Which is more harmful?
http://www.definicionabc.com/general/gestalt-psicologia.php
40
/ 51
Tractable Simulation?
http://imgs.xkcd.com/store/glen_shirts/g_try_science_shirt_2.jpg
41
HLT
HLT
HLT

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
Divya Sugumar
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Question answering
Question answeringQuestion answering
Question answering
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 

Viewers also liked

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
Vahid Saffarian
 
Internet basics
Internet basicsInternet basics
Internet basics
osuchin
 
Stylistics-LET Review
Stylistics-LET ReviewStylistics-LET Review
Stylistics-LET Review
h4976
 
Types of deviation
Types of deviationTypes of deviation
Types of deviation
Amer Minhas
 

Viewers also liked (20)

The Computational Mind, Linguistics, and Noam Chomsky
The Computational Mind, Linguistics, and Noam ChomskyThe Computational Mind, Linguistics, and Noam Chomsky
The Computational Mind, Linguistics, and Noam Chomsky
 
Internet and www
Internet and wwwInternet and www
Internet and www
 
Lab 3 internet & www
Lab 3   internet & wwwLab 3   internet & www
Lab 3 internet & www
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
The internet and www 2
The internet and www 2The internet and www 2
The internet and www 2
 
Internet basic
Internet basicInternet basic
Internet basic
 
Machine translation
Machine translationMachine translation
Machine translation
 
brief history of stylistics
 brief history of stylistics brief history of stylistics
brief history of stylistics
 
Stylistics - Norm and Deviation.
Stylistics - Norm and Deviation.Stylistics - Norm and Deviation.
Stylistics - Norm and Deviation.
 
Internet basic
Internet basicInternet basic
Internet basic
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Internet basics
Internet basicsInternet basics
Internet basics
 
Stylistics-LET Review
Stylistics-LET ReviewStylistics-LET Review
Stylistics-LET Review
 
Stylistics
Stylistics Stylistics
Stylistics
 
Step by step stylistic analysis
Step by step stylistic analysisStep by step stylistic analysis
Step by step stylistic analysis
 
Types of deviation
Types of deviationTypes of deviation
Types of deviation
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Stylistics
StylisticsStylistics
Stylistics
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 

Similar to HLT

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
Brian Ulicny
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
tanishamahajan11
 

Similar to HLT (20)

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
1908 working memory
1908 working memory1908 working memory
1908 working memory
 
Hacks for academic writing
Hacks for academic writingHacks for academic writing
Hacks for academic writing
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 
groovy & grails - lecture 1
groovy & grails - lecture 1groovy & grails - lecture 1
groovy & grails - lecture 1
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Intro
IntroIntro
Intro
 

More from Mike Tian-Jian Jiang

Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Mike Tian-Jian Jiang
 

More from Mike Tian-Jian Jiang (7)

ELUTE
ELUTEELUTE
ELUTE
 
From minimal feedback vertex set to democracy
From minimal feedback vertex set to democracyFrom minimal feedback vertex set to democracy
From minimal feedback vertex set to democracy
 
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
 
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
 
Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick t...
Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick t...Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick t...
Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick t...
 
NLP
NLPNLP
NLP
 
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

HLT