SlideShare ist ein Scribd-Unternehmen logo
1 von 86
CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 5 2 August 2007
WORDS The Building Blocks of Language
[object Object],[object Object]
Tokens, Types and Texts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extracting text from the Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extracting text from NLTK Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Brown Corpus ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
Corpus Linguistics ,[object Object],[object Object],[object Object],[object Object],[object Object]
What’s a word? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Another example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some Useful Empirical Observations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Common words in  Tom Sawyer but words in NL have an uneven distribution…
Text properties (formalized) Sample word frequency data
Frequency of frequencies ,[object Object],[object Object],[object Object],[object Object],[object Object]
Zipf’s Law ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Zipf’s Law ,[object Object]
Zipf curve
Predicting Occurrence Frequencies ,[object Object],[object Object],[object Object],[object Object],Fraction of words with frequency  n  is: Fraction  of words appearing only once is therefore ½.
Explanations for Zipf’s Law ,[object Object]
Zipf’s First Law ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Zipf’s Second Law ,[object Object],[object Object],[object Object]
Zipf’s Third Law ,[object Object],[object Object],[object Object]
Zipf’s Law Impact on Language Analysis ,[object Object],[object Object]
Vocabulary Growth ,[object Object],[object Object],[object Object]
Heaps’ Law ,[object Object],[object Object],[object Object],[object Object]
Heaps’ Law Data
Word counts are interesting... ,[object Object],[object Object],[object Object],[object Object],[object Object]
Zipf’s Law on Tom Saywer ,[object Object],[object Object],[object Object],[object Object]
Plot of Zipf’s Law ,[object Object],[object Object]
Plot of Zipf’s Law (con’t) ,[object Object],[object Object]
Zipf’s Law, so what? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
N-Grams and Corpus Linguistics
A bad language model N-grams & Language Modeling
A bad language model
A bad language model Herman is reprinted with permission from LaughingStock Licensing Inc., Ottawa Canada.  All rights reserved.
What’s a Language Model ,[object Object],[object Object],[object Object]
What’s a language model for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Next Word Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
Human Word Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Claim ,[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Simple N-Grams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
N-grams ,[object Object],[object Object],[object Object],[object Object]
Computing the Probability of a Word Sequence ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigram Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using N-Grams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The  n -gram Approximation ,[object Object],[object Object],[object Object],[object Object],[object Object]
n- grams, continued ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
N-grams for Language Generation ,[object Object],Unigram: 5. …Here words are chosen independently but with their appropriate frequencies. REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE. Bigram: 6. Second-order word approximation. The word transition probabilities are correct but no further structure is included. THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
N-Gram Models of Language ,[object Object],[object Object],[object Object],[object Object],[object Object]
Counting Words in Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Simple N-Grams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Computing the Probability of a Word Sequence ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigram Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using N-Grams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training and Testing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A Simple Example ,[object Object],[object Object]
A Bigram Grammar Fragment from BERP .001 Eat British .03 Eat today .007 Eat dessert .04 Eat Indian .01 Eat tomorrow .04 Eat a .02 Eat Mexican .04 Eat at .02 Eat Chinese .05 Eat dinner .02 Eat in .06 Eat lunch .03 Eat breakfast .06 Eat some .03 Eat Thai .16 Eat on
.01 British lunch .05 Want a .01 British cuisine .65 Want to .15 British restaurant .04 I have .60 British food .08 I don’t .02 To be .29 I would .09 To spend .32 I want .14 To have .02 <start> I’m .26 To eat .04 <start> Tell .01 Want Thai .06 <start> I’d .04 Want some .25 <start> I
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BERP Bigram Counts 0 1 0 0 0 0 4 Lunch 0 0 0 0 17 0 19 Food 1 120 0 0 0 0 2 Chinese 52 2 19 0 2 0 0 Eat 12 0 3 860 10 0 3 To 6 8 6 0 786 0 3 Want 0 0 0 13 0 1087 8 I lunch Food Chinese Eat To Want I
BERP Bigram Probabilities ,[object Object],[object Object],[object Object],[object Object],[object Object],459 1506 213 938 3256 1215 3437 Lunch Food Chinese Eat To Want I
What do we learn about the language? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Approximating Shakespeare ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
N-Gram Training Sensitivity ,[object Object],[object Object]
Some Useful Empirical Observations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Smoothing Techniques ,[object Object],[object Object],[object Object]
Smoothing Techniques ,[object Object],[object Object],[object Object]
Add-one Smoothing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Witten-Bell Discounting
[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Good-Turing Discounting
Backoff methods (e.g. Katz ‘87) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Andere mochten auch

anaGlay mOraila Souza
anaGlay mOraila SouzaanaGlay mOraila Souza
anaGlay mOraila Souzaguest0b0c7f
 
Monday Notes 9/16/2007
Monday Notes 9/16/2007Monday Notes 9/16/2007
Monday Notes 9/16/2007jmurph
 
Los peligros de Internet.
Los peligros de Internet.Los peligros de Internet.
Los peligros de Internet.guest62b173
 
Michelle
MichelleMichelle
Michellelisa12
 
Naica Cavernade Cristal
Naica Cavernade CristalNaica Cavernade Cristal
Naica Cavernade Cristaldcasco
 
Perimetros De Poligonos
Perimetros De PoligonosPerimetros De Poligonos
Perimetros De Poligonosguest372be4
 
A Rough Guide towards Govt 2 V0
A  Rough  Guide towards Govt 2 V0A  Rough  Guide towards Govt 2 V0
A Rough Guide towards Govt 2 V0mike_accease
 
Edusim New Interface
Edusim New InterfaceEdusim New Interface
Edusim New Interfacerichwhite
 
AutoPagerize Shibuya.js 2007 9/15
AutoPagerize Shibuya.js 2007 9/15AutoPagerize Shibuya.js 2007 9/15
AutoPagerize Shibuya.js 2007 9/15swdyh
 
Dantesinferno Se
Dantesinferno SeDantesinferno Se
Dantesinferno Seguest236192
 
Fiesta De Disfraces
Fiesta De DisfracesFiesta De Disfraces
Fiesta De Disfracesiluscave i
 
7th Grade Chapter 2 Lesson 1
7th Grade Chapter 2 Lesson 17th Grade Chapter 2 Lesson 1
7th Grade Chapter 2 Lesson 1MRS.KDUNCAN
 
Jacinto Piedraaa!
Jacinto Piedraaa!Jacinto Piedraaa!
Jacinto Piedraaa!Joaco
 
7th Grade Chapter 2 Lesson 4
7th Grade Chapter 2 Lesson 47th Grade Chapter 2 Lesson 4
7th Grade Chapter 2 Lesson 4MRS.KDUNCAN
 

Andere mochten auch (20)

anaGlay mOraila Souza
anaGlay mOraila SouzaanaGlay mOraila Souza
anaGlay mOraila Souza
 
Monday Notes 9/16/2007
Monday Notes 9/16/2007Monday Notes 9/16/2007
Monday Notes 9/16/2007
 
Los peligros de Internet.
Los peligros de Internet.Los peligros de Internet.
Los peligros de Internet.
 
Sep18 Mobile
Sep18 MobileSep18 Mobile
Sep18 Mobile
 
Michelle
MichelleMichelle
Michelle
 
Milagros
MilagrosMilagros
Milagros
 
Naica Cavernade Cristal
Naica Cavernade CristalNaica Cavernade Cristal
Naica Cavernade Cristal
 
Preston
PrestonPreston
Preston
 
Vma07
Vma07Vma07
Vma07
 
Perimetros De Poligonos
Perimetros De PoligonosPerimetros De Poligonos
Perimetros De Poligonos
 
A Rough Guide towards Govt 2 V0
A  Rough  Guide towards Govt 2 V0A  Rough  Guide towards Govt 2 V0
A Rough Guide towards Govt 2 V0
 
DivisióN
DivisióNDivisióN
DivisióN
 
Edusim New Interface
Edusim New InterfaceEdusim New Interface
Edusim New Interface
 
AutoPagerize Shibuya.js 2007 9/15
AutoPagerize Shibuya.js 2007 9/15AutoPagerize Shibuya.js 2007 9/15
AutoPagerize Shibuya.js 2007 9/15
 
Dantesinferno Se
Dantesinferno SeDantesinferno Se
Dantesinferno Se
 
Fiesta De Disfraces
Fiesta De DisfracesFiesta De Disfraces
Fiesta De Disfraces
 
7th Grade Chapter 2 Lesson 1
7th Grade Chapter 2 Lesson 17th Grade Chapter 2 Lesson 1
7th Grade Chapter 2 Lesson 1
 
KM Postcards
KM PostcardsKM Postcards
KM Postcards
 
Jacinto Piedraaa!
Jacinto Piedraaa!Jacinto Piedraaa!
Jacinto Piedraaa!
 
7th Grade Chapter 2 Lesson 4
7th Grade Chapter 2 Lesson 47th Grade Chapter 2 Lesson 4
7th Grade Chapter 2 Lesson 4
 

Ähnlich wie sadf

Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalcaptainmactavish1996
 
crypto_graphy_PPTs.pdf
crypto_graphy_PPTs.pdfcrypto_graphy_PPTs.pdf
crypto_graphy_PPTs.pdfMajidMumtaz3
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfHabtamu100
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorDr. Cupid Lucid
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfJemalNesre1
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyoutsider2
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easyGopi Krishnan Nambiar
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfjaishreemane73
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 

Ähnlich wie sadf (20)

Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
crypto_graphy_PPTs.pdf
crypto_graphy_PPTs.pdfcrypto_graphy_PPTs.pdf
crypto_graphy_PPTs.pdf
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
Introduction to linguistics
Introduction to linguisticsIntroduction to linguistics
Introduction to linguistics
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
Ir 03
Ir   03Ir   03
Ir 03
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Linguistics
LinguisticsLinguistics
Linguistics
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

sadf

  • 1. CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 5 2 August 2007
  • 2. WORDS The Building Blocks of Language
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.  
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Common words in Tom Sawyer but words in NL have an uneven distribution…
  • 16. Text properties (formalized) Sample word frequency data
  • 17.
  • 18.
  • 19.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. N-Grams and Corpus Linguistics
  • 36. A bad language model N-grams & Language Modeling
  • 37. A bad language model
  • 38. A bad language model Herman is reprinted with permission from LaughingStock Licensing Inc., Ottawa Canada. All rights reserved.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66. A Bigram Grammar Fragment from BERP .001 Eat British .03 Eat today .007 Eat dessert .04 Eat Indian .01 Eat tomorrow .04 Eat a .02 Eat Mexican .04 Eat at .02 Eat Chinese .05 Eat dinner .02 Eat in .06 Eat lunch .03 Eat breakfast .06 Eat some .03 Eat Thai .16 Eat on
  • 67. .01 British lunch .05 Want a .01 British cuisine .65 Want to .15 British restaurant .04 I have .60 British food .08 I don’t .02 To be .29 I would .09 To spend .32 I want .14 To have .02 <start> I’m .26 To eat .04 <start> Tell .01 Want Thai .06 <start> I’d .04 Want some .25 <start> I
  • 68.
  • 69. BERP Bigram Counts 0 1 0 0 0 0 4 Lunch 0 0 0 0 17 0 19 Food 1 120 0 0 0 0 2 Chinese 52 2 19 0 2 0 0 Eat 12 0 3 860 10 0 3 To 6 8 6 0 786 0 3 Want 0 0 0 13 0 1087 8 I lunch Food Chinese Eat To Want I
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.