SlideShare a Scribd company logo
1 of 93
A Panorama of
Natural Language
Processing
Ted Xiao
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
What is Natural Language
Processing?
NLP!
Artificial Languages: Java, C++,
Binary…
What is Natural Language
Processing?
NLP!
Artificial Languages: Java, C++,
Binary…
Natural Language: Language
spoken by people.
What is Natural Language
Processing?
NLP!
Artificial Languages: Java, C++,
Binary…
Natural Language: Language
spoken by people.
Motivation: Sophisticated linguistic
analysis for human-like
sophistication for a range of tasks
or applications.
What is Natural Language
Processing?
NLP!
Goal: have computers understand natural
language in order to perform useful tasks
Artificial Languages: Java, C++,
Binary…
Natural Language: Language
spoken by people.
Motivation: Sophisticated linguistic
analysis for human-like
sophistication for a range of tasks
or applications.
Task Types
● Syntax
○ Parsing
○ Stemming
○ Part of speech tagging
● Discourse
○ Parsing
○ Stemming
○ Part of speech tagging
Task Types
● Syntax
○ Parsing
○ Stemming
○ Part of speech tagging
● Semantics
○ Machine Translation
○ Natural Language
Understanding, Generation
○ OCR
○ QA, Sentiment Analysis
○ Coreference
● Discourse
○ Parsing
○ Stemming
○ Part of speech tagging
● Speech
○ Speech Recognition
○ Text-to-Speech
○ Speech-to-Text
Task Examples
Task Examples
NLP in Industry
What Makes NLP Difficult?
• We don’t understand language ourselves
• Language encodes meaning
• Language is learned intuitively - easy for children, hard for computers
or
What Makes NLP Difficult?
• We don’t understand language ourselves
• Language encodes meaning
• Language is learned intuitively - easy for children, hard for computers
• Ambiguity
• Language is symbolic
• Subtleties: sarcasm, wordplay, idioms...
or
NLP vs. PLP
Programming Language Processing is easier than Natural Language Processing
● The Pope’s baby steps on gays
● Scientists study whales from space
● Juvenile court to try shooting defendant
● Boy paralyzed after tumor fights back to gain black belt
Examples of ambiguity: news headlines
An NLP Disaster
An NLP Disaster:
Microsoft Tay (March 2016)
...again?!
Microsoft Zo (December 2016)
Overview
• Background
• Linguistics
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
Grammars
● Grammars are the formal description of the structure of a
language
● Skeleton of any language
Basic Linguistics
Context
Form
Meaning
Structure
Audio
Levels of NLP
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
Digitalizing Natural Language
• Need to have some measure of similarity and differences between words
• Vectors can do this!
Digitalizing Natural Language
• Need to have some measure of similarity and differences between words
• Vectors can do this!
• We can use vector operations to gauge similarity between words
Word Vectors
• 13 million tokens in the English language
• Many words are similar (cat and feline, man and woman, etc…)
• A nice idea: encode word tokens into a vector that is a point in some
word space with dimension << 13 million
One-Hot Vectors
• Express each word as an |V| dimensional vector with one 1 and
the rest 0s, where |V| is the size of our vocabulary
• One-hot vectors for a dictionary would look like:
What’s Wrong?
• One hot vectors are independent (orthogonal)
• But some words are similar!
• A nicer idea: reduce the size of the space from |V| to a smaller-
dimensional subspace that encodes relationships between words
Quick Aside: Singular Value
Decomposition (SVD)
(mxm) matrix of left singular vectors
(nxn) matrix of
right singular vectors
(mxn) matrix with the singular values of X on its diagonals
(mxn)
Take away point: The SVD of X is the best rank-k
approximation
X = USVT
Illustration of the SVD as a
Rank-k Approximation
SVD-Based Methods: Window-based Co-
occurrence Matrix
• Only count the number of times a word appears inside a window of a
particular size around the word of interest
• Consider the following three documents:
• I enjoy flying
• I like NLP
• I like deep learning window size = 1
Applying the SVD to the co-
occurrence matrix
• X = USVT
• Truncate S at some index k based on the amount of variance captured:
• Take the sub-matrix Ui:V,1:k to be our word embeddings
• Now have a k-dimensional representation of every word in our
vocabulary!
Downfalls of SVD-based Methods
• Co-occurrence matrix is high dimensional and sparse
• SVDs are computationally expensive (quadratic cost)
• Dimensions of the co-occurrence matrix are constantly
changing as new words are added
Downfalls of SVD-based Methods
• Co-occurrence matrix is high dimensional and sparse
• SVDs are computationally expensive (quadratic cost)
• Dimensions of the co-occurrence matrix are constantly
changing as new words are added
• Solution: iteration-based methods!
Iteration-based Methods
• Word Vectors and Word Embeddings: Used to Find similarity
Compute and store representative information about a huge dataset
• Iteration-based Methods: create a model that learns one iteration at a time that
will eventually be able to encode the probability of a word given its context
• These include basic language models as well as the more advanced word2vec
Basic Language Models
• Bag of Words
• Just count the frequencies of words.
• Issues: High dimension, and order and
relations are lost
Basic Language Models
• Bag of Words
• Just count the frequencies of words.
• Issues: High dimension, and order and
relations are lost
• Term Frequency-inverse Document Frequency
• AKA TF-IDF
• How important a word is in a document
• Used in search engines!
Basic Language Models
• Question: Are there ways we can maintain
information about word order and meaning?
• Bag of Words
• Just count the frequencies of words.
• Issues: High dimension, and order and
relations are lost
• Term Frequency-inverse Document Frequency
• AKA TF-IDF
• How important a word is in a document
• Used in search engines!
Language Models
• Goal: assign a probability to a sequence of tokens
• Consider the two sentences:
• “The dog wagged his tail”
• “Puffer fish bank ladder”
• Which should have a higher probability?
• If we assume that word occurrences are independent, the
probability of any given sequence of words is:
(Unigram model)
What if Word Occurrences Are
Not Independent?
• Assume the probability of a sequence depends on the pairwise
probability of a word in the sequence and a word next to it (bigrams)
• A bigram model is of the form:
• The general N-gram model is given by:
Approaches So Far
● Simple models trained on huge amounts of data outperform complex
models trained on small amounts of data
● Unigrams:
● Bigrams:
● N-grams:
Continuous Bag of Words
(CBOW)
• Consider part of the sequence of words as context, and try to
predict the center words
• Sentence: “The dog wagged his tail”
• Context: {“The”, “dog”, “his”, “tail}
• Center words: “wagged”
• Our known parameters are the sentence in question represented
by one-hot vectors
• Let x(c) denote the context words
• Lets y(c) denote the target word (output)
Continuous Bag of Words
Skip-grams
• Now, consider the center word as context and try to predict
surrounding words
• Sentence: “The dog wagged his tail”
• Context: “wagged”
• Surrounding words: {“The”, “dog”, “his”, “tail”}
• Nearly identical set-up to CBOW, except we switch our x and y
• Input: one-hot vector (context word)
• Output: vectors describing the surrounding words
Skip-grams
Recap
● We first tried condensing language into word vectors
○ We want to keep meaning in a lower dimension
○ One-hot Vectors, SVD… These are expensive!
Recap
● We first tried condensing language into word vectors
○ We want to keep meaning in a lower dimension
○ One-hot Vectors, SVD… These are expensive!
● We then tried iteration based methods
○ Language Models: Bag of Words, TF IDF… no context!
Recap
● We first tried condensing language into word vectors
○ We want to keep meaning in a lower dimension
○ One-hot Vectors, SVD… These are expensive!
● We then tried iteration based methods
○ Language Models: Bag of Words, TF IDF… no context!
● We add in context with N-gram models
Recap
● We first tried condensing language into word vectors
○ We want to keep meaning in a lower dimension
○ One-hot Vectors, SVD… These are expensive!
● We then tried iteration based methods
○ Language Models: Bag of Words, TF IDF… no context!
● We add in context with N-gram models
● We extended these with CBOW and Skip-grams
○ CBOW: Predict center word
○ Skip-grams: Predict surrounding words
• So far, we have an expression for the chance that a sequence of
words appears as products of conditional probabilities
• Now, we’d like models that can learn the probabilities of word-
sequences
• Solution: Word2vec (Mikolov et al, 2013)
Learning word-sequence
probabilities
Word2vec
• A neural network implementation that learns distributed representations
for words
• 2 algorithms
• Continuous bag of words
• Skip-grams
• 2 training methods
• Negative sampling
• Hierarchical softmax
• Best part: DOES NOT NEED LABELED DATA!
Word2vec
• Many of those steps are complicated...
• Luckily, someone made software that does this for us
• Gensim is a python package that can do all of this complicating
word2vec stuff in a few lines of code
• Results: Training high dimensional word vectors on a large amount of
data captures “Subtle semantic relationships between words”
Reflecting on word2vec
• Words with similar meanings occur in clusters
• Clusters are spaced such that some word relationships (such as
analogies) can be reproduced with vector math
• Famous example (with highly trained word vectors)
• “king” - “man” + “woman” = “queen”
• Useful feature: word2vec does not require labeled data
• Most data in the world is unlabeled!
• Word embeddings are very useful for prediction and translation tasks, as
well as sentiment analysis
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
Modern NLP
• Advances in NLP were largely driven by
• A vast increase in computing power
• A better understanding of human language
• Development of successful ML algorithms
• Big data
• Much of current work involves:
• Machine translation
• Spoken dialogue and conversational agents
• Machine reading
• Mining social media
• Analysis and generation of speaker state
Forms of NLP Data
User data Corpora
Dictionaries
Ontologies and databases
NLP Data Sources
• Wikimedia
• APIs: Twitter, Wordnik, …
• Common crawl
• Wordnet
• Linguistic data consortium (www.ldc.upenn.edu)
• University sites and the academic community
• Stanford, Oxford, CMU
• Create your own!
• Web-scrape, crowd-source, linguists
Deep Learning vs Non-deep
Learning Methods
• Bag of words may outperform deep learning models in
modest sized datasets
• Word2vec sees a drastic improvement with a LOT of text
• In literature, distributed word vector techniques outperform
bag of words models
Deep learning tries to capture the recursive nature of
natural language
Deep Learning for NLP
• Deep learning attempts to learn multiple levels of representation of increasing
complexity and abstraction
• Want computers to be able to understand the recursive nature of human
language
• Recursive/recurrent neural networks!
• DL models can be fast ways to solve NLP tasks
Recurrent Neural Network
● Recurrent Neural Network
○ Connections between units
form are directed cycles
○ Internal state of the network
allows it to exhibit dynamic
temporal behavior
○ Success in speech recognition,
natural language, translation,
etc.
● Long Short-term Memory: LSTM
Recurrent Neural Network
seq2seq
● Applied RNNs to Sequences
○ Generate a response based on meaningful input
○ For example, translate from English to French
● Two RNNs: an encoder that processes the input and a decoder that generates the
output.
Recursive Deep Learning
• Compositional vector grammars (parsing)
• Recursive autoencoders (paraphrase detection)
• Matrix-vector RNNs (relation classification)
• Recursive neural tensor networks (sentiment analysis)
What’s at UC Berkeley?
• Berkeley NLP Research - Dan Klein
• Computer Vision - Alexei Efros, Jitendra Malik
CV + NLP: Visual Question Answer
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
What the (near) future holds
Bots!
Think Siri, but actually functional instead of a toy
What the (near) future holds
Supporting invisible UI!
The concept of invisible or zero user interaction between user and
machine
What the (near) future holds
Smarter search!
The same capabilities that allow a chatbox to understand a customer’s
request can enable “search like you talk” functionality
What the (near) future holds
Intelligence from unstructured information!
Analysis that accurately understands the subtleties of natural language
(choice of words, tone, etc) can provide useful knowledge and insight of
information
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demos
NLP in Industry
NLP Architectures
• Layered Model
• Preprocessing
• Low-level analysis
• Semantic Analysis
• Conversion to end products
• Input/Output as API Structure
NLP at Scale
• Systems come before algorithms
• Objective functions are messy
• Everything is changing
• Understanding-optimization trade-off
NLP at Scale
• Systems come before algorithms
• Objective functions are messy
• Everything is changing
• Understanding-optimization trade-off
Developing an NLP system
1. Exploration
a. Translate real-world requirements
into a measurable goal
b. Find an appropriate level and
representation
c. Find data for experiments
Developing an NLP system
1. Exploration
a. Translate real-world requirements into
a measurable goal
b. Find an appropriate level and
representation
c. Find data for experiments
2. Development
a. Find and utilize existing tools and
frameworks
b. Set up and perform a series of
experiments
Developing an NLP system
1. Exploration
a. Translate real-world requirements into
a measurable goal
b. Find an appropriate level and
representation
c. Find data for experiments
2. Development
a. Find and utilize existing tools and
frameworks
b. Set up and perform a series of
experiments
3. Production
a. CPU/GPU intensive
b. Most NLP frameworks
are not production-ready
c. Pre- and post-
processing is invaluable
d. Collect user feedback
I Have the Model… Now What?
1. Specify Performance Requirements
2. Separate Prediction Algorithm From Model
Coefficients
a. Select or Implement The Prediction Algorithm
b. Serialize Your Model Coefficients
3. Develop Automated Tests For Your Model
4. Develop Back-Testing and Now-Testing
Infrastructure
5. Challenge Then Trial Model Updates
Tips for NLP
• Proper preprocessing is VERY important
• Know your domain!
• Validate your models!
• Human judges
• Cross-validation
Overview
• Background
• Grammars
• Word Representation
• Modern NLP
• Future Directions
• NLP in Industry
• Demo
Programming Language Identification
Exploring code on GitHub
Goal is to figure out what language a file uses
Potential Methods?
Filename, keywords, comments
Whitespace, syntax
Must be scalable to handle large number of constantly updating repos
Existing Model
Linguist: Heuristics + Naive Bayes
Heuristics can be accurate but require updating and fine-tuning
Naive Bayes depends on word frequencies - predictions are linear to vocab size
Hard-coded rules do most of the work, leaving the Naive Bayes as a last resort
Heavily dependent on file extension classification
Selective Classification
With file extensions, only 87% of files are classified
Thank you!
ted@ml.berkeley.edu
Special thanks to Jordan Prosky
Appendix
Appendix
• Language is meant to convey meaning, which we have a natural
way of encoding
• Children learn this very fast!
• Hard for computers to learn…
• Language is a symbolic signaling system
• Example: pen: or ?
• Other subtleties: sarcasm, expressive signaling, …
What makes NLP difficult?
• Language is meant to convey meaning, which we have a natural
way of encoding
• Children learn this very fast!
• Hard for computers to learn…
• Language is a symbolic signaling system
• Example: pen: or ?
• Other subtleties: sarcasm, expressive signaling, …
Basics of NLP data preprocessing
• Domain specific!
• Tokenization
• Example: “This is a test that isn’t so simple”
• Tokens: “This”, “is”, “a”, “test”, “that”, “is”, “n’t”, “so”,
“simple”
• Regular expressions
• Stemming
• Lower-casing
• Removing/adding punctuation
• Other…
SVD-Based Methods
• Loop over a massive dataset to accumulate word co-occurrence counts in
some matrix X
• Perform the SVD on X to get U, S, and V
• Use the rows of U as the word embeddings for all words in your dictionary
X = ?
SVD-Based Methods: Word-
Document Matrix
• Assumption: related words often appear in the same document
• Loop over many documents and every time word i appears in
document j, add one to entry Xij
• Very high dimensional - let’s try something better
We are not quite done…
• Need to find suitable U and V matrices!
• Two algorithms help us get what we want:
• Hierarchical softmax
• Negative sampling
• These are complicated!
• Luckily, someone made software that does this for us
• Gensim is a python package that can do all of this complicating
word2vec stuff in a few lines of code

More Related Content

What's hot

Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Language models
Language modelsLanguage models
Language models
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Viewers also liked

Data Science Competition
Data Science CompetitionData Science Competition
Data Science CompetitionJeong-Yoon Lee
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingHackerEarth
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science CompetitionJeong-Yoon Lee
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovationHackerEarth
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningHJ van Veen
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Spark Summit
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R PackageDataRobot
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talentHackerEarth
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?HackerEarth
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringDataRobot
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHackerEarth
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case StudyHackerEarth
 
Menstrual Health Reader - mEo
Menstrual Health Reader - mEoMenstrual Health Reader - mEo
Menstrual Health Reader - mEoHackerEarth
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionJeong-Yoon Lee
 

Viewers also liked (20)

Work - LIGHT Ministry
Work - LIGHT MinistryWork - LIGHT Ministry
Work - LIGHT Ministry
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
 
Menstrual Health Reader - mEo
Menstrual Health Reader - mEoMenstrual Health Reader - mEo
Menstrual Health Reader - mEo
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
 

Similar to A Panorama of Natural Language Processing

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherMLReview
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationMaryOsborne11
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722LinkedIn
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722LinkedIn
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
introtonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfintrotonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfAdityaMishra178868
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 

Similar to A Panorama of Natural Language Processing (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense Disambiguation
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
introtonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfintrotonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdf
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Word embedding
Word embedding Word embedding
Word embedding
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

A Panorama of Natural Language Processing

  • 1. A Panorama of Natural Language Processing Ted Xiao
  • 2. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 3. What is Natural Language Processing? NLP! Artificial Languages: Java, C++, Binary…
  • 4. What is Natural Language Processing? NLP! Artificial Languages: Java, C++, Binary… Natural Language: Language spoken by people.
  • 5. What is Natural Language Processing? NLP! Artificial Languages: Java, C++, Binary… Natural Language: Language spoken by people. Motivation: Sophisticated linguistic analysis for human-like sophistication for a range of tasks or applications.
  • 6. What is Natural Language Processing? NLP! Goal: have computers understand natural language in order to perform useful tasks Artificial Languages: Java, C++, Binary… Natural Language: Language spoken by people. Motivation: Sophisticated linguistic analysis for human-like sophistication for a range of tasks or applications.
  • 7. Task Types ● Syntax ○ Parsing ○ Stemming ○ Part of speech tagging ● Discourse ○ Parsing ○ Stemming ○ Part of speech tagging
  • 8. Task Types ● Syntax ○ Parsing ○ Stemming ○ Part of speech tagging ● Semantics ○ Machine Translation ○ Natural Language Understanding, Generation ○ OCR ○ QA, Sentiment Analysis ○ Coreference ● Discourse ○ Parsing ○ Stemming ○ Part of speech tagging ● Speech ○ Speech Recognition ○ Text-to-Speech ○ Speech-to-Text
  • 12. What Makes NLP Difficult? • We don’t understand language ourselves • Language encodes meaning • Language is learned intuitively - easy for children, hard for computers or
  • 13. What Makes NLP Difficult? • We don’t understand language ourselves • Language encodes meaning • Language is learned intuitively - easy for children, hard for computers • Ambiguity • Language is symbolic • Subtleties: sarcasm, wordplay, idioms... or
  • 14. NLP vs. PLP Programming Language Processing is easier than Natural Language Processing
  • 15. ● The Pope’s baby steps on gays ● Scientists study whales from space ● Juvenile court to try shooting defendant ● Boy paralyzed after tumor fights back to gain black belt Examples of ambiguity: news headlines
  • 17. An NLP Disaster: Microsoft Tay (March 2016)
  • 19. Overview • Background • Linguistics • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 20. Grammars ● Grammars are the formal description of the structure of a language ● Skeleton of any language
  • 23. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 24. Digitalizing Natural Language • Need to have some measure of similarity and differences between words • Vectors can do this!
  • 25. Digitalizing Natural Language • Need to have some measure of similarity and differences between words • Vectors can do this! • We can use vector operations to gauge similarity between words
  • 26. Word Vectors • 13 million tokens in the English language • Many words are similar (cat and feline, man and woman, etc…) • A nice idea: encode word tokens into a vector that is a point in some word space with dimension << 13 million
  • 27. One-Hot Vectors • Express each word as an |V| dimensional vector with one 1 and the rest 0s, where |V| is the size of our vocabulary • One-hot vectors for a dictionary would look like:
  • 28. What’s Wrong? • One hot vectors are independent (orthogonal) • But some words are similar! • A nicer idea: reduce the size of the space from |V| to a smaller- dimensional subspace that encodes relationships between words
  • 29. Quick Aside: Singular Value Decomposition (SVD) (mxm) matrix of left singular vectors (nxn) matrix of right singular vectors (mxn) matrix with the singular values of X on its diagonals (mxn) Take away point: The SVD of X is the best rank-k approximation X = USVT
  • 30. Illustration of the SVD as a Rank-k Approximation
  • 31. SVD-Based Methods: Window-based Co- occurrence Matrix • Only count the number of times a word appears inside a window of a particular size around the word of interest • Consider the following three documents: • I enjoy flying • I like NLP • I like deep learning window size = 1
  • 32. Applying the SVD to the co- occurrence matrix • X = USVT • Truncate S at some index k based on the amount of variance captured: • Take the sub-matrix Ui:V,1:k to be our word embeddings • Now have a k-dimensional representation of every word in our vocabulary!
  • 33. Downfalls of SVD-based Methods • Co-occurrence matrix is high dimensional and sparse • SVDs are computationally expensive (quadratic cost) • Dimensions of the co-occurrence matrix are constantly changing as new words are added
  • 34. Downfalls of SVD-based Methods • Co-occurrence matrix is high dimensional and sparse • SVDs are computationally expensive (quadratic cost) • Dimensions of the co-occurrence matrix are constantly changing as new words are added • Solution: iteration-based methods!
  • 35. Iteration-based Methods • Word Vectors and Word Embeddings: Used to Find similarity Compute and store representative information about a huge dataset • Iteration-based Methods: create a model that learns one iteration at a time that will eventually be able to encode the probability of a word given its context • These include basic language models as well as the more advanced word2vec
  • 36. Basic Language Models • Bag of Words • Just count the frequencies of words. • Issues: High dimension, and order and relations are lost
  • 37. Basic Language Models • Bag of Words • Just count the frequencies of words. • Issues: High dimension, and order and relations are lost • Term Frequency-inverse Document Frequency • AKA TF-IDF • How important a word is in a document • Used in search engines!
  • 38. Basic Language Models • Question: Are there ways we can maintain information about word order and meaning? • Bag of Words • Just count the frequencies of words. • Issues: High dimension, and order and relations are lost • Term Frequency-inverse Document Frequency • AKA TF-IDF • How important a word is in a document • Used in search engines!
  • 39. Language Models • Goal: assign a probability to a sequence of tokens • Consider the two sentences: • “The dog wagged his tail” • “Puffer fish bank ladder” • Which should have a higher probability? • If we assume that word occurrences are independent, the probability of any given sequence of words is: (Unigram model)
  • 40. What if Word Occurrences Are Not Independent? • Assume the probability of a sequence depends on the pairwise probability of a word in the sequence and a word next to it (bigrams) • A bigram model is of the form: • The general N-gram model is given by:
  • 41. Approaches So Far ● Simple models trained on huge amounts of data outperform complex models trained on small amounts of data ● Unigrams: ● Bigrams: ● N-grams:
  • 42. Continuous Bag of Words (CBOW) • Consider part of the sequence of words as context, and try to predict the center words • Sentence: “The dog wagged his tail” • Context: {“The”, “dog”, “his”, “tail} • Center words: “wagged” • Our known parameters are the sentence in question represented by one-hot vectors • Let x(c) denote the context words • Lets y(c) denote the target word (output)
  • 44. Skip-grams • Now, consider the center word as context and try to predict surrounding words • Sentence: “The dog wagged his tail” • Context: “wagged” • Surrounding words: {“The”, “dog”, “his”, “tail”} • Nearly identical set-up to CBOW, except we switch our x and y • Input: one-hot vector (context word) • Output: vectors describing the surrounding words
  • 46. Recap ● We first tried condensing language into word vectors ○ We want to keep meaning in a lower dimension ○ One-hot Vectors, SVD… These are expensive!
  • 47. Recap ● We first tried condensing language into word vectors ○ We want to keep meaning in a lower dimension ○ One-hot Vectors, SVD… These are expensive! ● We then tried iteration based methods ○ Language Models: Bag of Words, TF IDF… no context!
  • 48. Recap ● We first tried condensing language into word vectors ○ We want to keep meaning in a lower dimension ○ One-hot Vectors, SVD… These are expensive! ● We then tried iteration based methods ○ Language Models: Bag of Words, TF IDF… no context! ● We add in context with N-gram models
  • 49. Recap ● We first tried condensing language into word vectors ○ We want to keep meaning in a lower dimension ○ One-hot Vectors, SVD… These are expensive! ● We then tried iteration based methods ○ Language Models: Bag of Words, TF IDF… no context! ● We add in context with N-gram models ● We extended these with CBOW and Skip-grams ○ CBOW: Predict center word ○ Skip-grams: Predict surrounding words
  • 50. • So far, we have an expression for the chance that a sequence of words appears as products of conditional probabilities • Now, we’d like models that can learn the probabilities of word- sequences • Solution: Word2vec (Mikolov et al, 2013) Learning word-sequence probabilities
  • 51. Word2vec • A neural network implementation that learns distributed representations for words • 2 algorithms • Continuous bag of words • Skip-grams • 2 training methods • Negative sampling • Hierarchical softmax • Best part: DOES NOT NEED LABELED DATA!
  • 52. Word2vec • Many of those steps are complicated... • Luckily, someone made software that does this for us • Gensim is a python package that can do all of this complicating word2vec stuff in a few lines of code • Results: Training high dimensional word vectors on a large amount of data captures “Subtle semantic relationships between words”
  • 53. Reflecting on word2vec • Words with similar meanings occur in clusters • Clusters are spaced such that some word relationships (such as analogies) can be reproduced with vector math • Famous example (with highly trained word vectors) • “king” - “man” + “woman” = “queen” • Useful feature: word2vec does not require labeled data • Most data in the world is unlabeled! • Word embeddings are very useful for prediction and translation tasks, as well as sentiment analysis
  • 54. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 55. Modern NLP • Advances in NLP were largely driven by • A vast increase in computing power • A better understanding of human language • Development of successful ML algorithms • Big data • Much of current work involves: • Machine translation • Spoken dialogue and conversational agents • Machine reading • Mining social media • Analysis and generation of speaker state
  • 56. Forms of NLP Data User data Corpora Dictionaries Ontologies and databases
  • 57. NLP Data Sources • Wikimedia • APIs: Twitter, Wordnik, … • Common crawl • Wordnet • Linguistic data consortium (www.ldc.upenn.edu) • University sites and the academic community • Stanford, Oxford, CMU • Create your own! • Web-scrape, crowd-source, linguists
  • 58. Deep Learning vs Non-deep Learning Methods • Bag of words may outperform deep learning models in modest sized datasets • Word2vec sees a drastic improvement with a LOT of text • In literature, distributed word vector techniques outperform bag of words models Deep learning tries to capture the recursive nature of natural language
  • 59. Deep Learning for NLP • Deep learning attempts to learn multiple levels of representation of increasing complexity and abstraction • Want computers to be able to understand the recursive nature of human language • Recursive/recurrent neural networks! • DL models can be fast ways to solve NLP tasks
  • 60. Recurrent Neural Network ● Recurrent Neural Network ○ Connections between units form are directed cycles ○ Internal state of the network allows it to exhibit dynamic temporal behavior ○ Success in speech recognition, natural language, translation, etc. ● Long Short-term Memory: LSTM
  • 62. seq2seq ● Applied RNNs to Sequences ○ Generate a response based on meaningful input ○ For example, translate from English to French ● Two RNNs: an encoder that processes the input and a decoder that generates the output.
  • 63. Recursive Deep Learning • Compositional vector grammars (parsing) • Recursive autoencoders (paraphrase detection) • Matrix-vector RNNs (relation classification) • Recursive neural tensor networks (sentiment analysis)
  • 64. What’s at UC Berkeley? • Berkeley NLP Research - Dan Klein • Computer Vision - Alexei Efros, Jitendra Malik CV + NLP: Visual Question Answer
  • 65. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 66. What the (near) future holds Bots! Think Siri, but actually functional instead of a toy
  • 67. What the (near) future holds Supporting invisible UI! The concept of invisible or zero user interaction between user and machine
  • 68. What the (near) future holds Smarter search! The same capabilities that allow a chatbox to understand a customer’s request can enable “search like you talk” functionality
  • 69. What the (near) future holds Intelligence from unstructured information! Analysis that accurately understands the subtleties of natural language (choice of words, tone, etc) can provide useful knowledge and insight of information
  • 70. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demos
  • 72.
  • 73. NLP Architectures • Layered Model • Preprocessing • Low-level analysis • Semantic Analysis • Conversion to end products • Input/Output as API Structure
  • 74. NLP at Scale • Systems come before algorithms • Objective functions are messy • Everything is changing • Understanding-optimization trade-off
  • 75. NLP at Scale • Systems come before algorithms • Objective functions are messy • Everything is changing • Understanding-optimization trade-off
  • 76. Developing an NLP system 1. Exploration a. Translate real-world requirements into a measurable goal b. Find an appropriate level and representation c. Find data for experiments
  • 77. Developing an NLP system 1. Exploration a. Translate real-world requirements into a measurable goal b. Find an appropriate level and representation c. Find data for experiments 2. Development a. Find and utilize existing tools and frameworks b. Set up and perform a series of experiments
  • 78. Developing an NLP system 1. Exploration a. Translate real-world requirements into a measurable goal b. Find an appropriate level and representation c. Find data for experiments 2. Development a. Find and utilize existing tools and frameworks b. Set up and perform a series of experiments 3. Production a. CPU/GPU intensive b. Most NLP frameworks are not production-ready c. Pre- and post- processing is invaluable d. Collect user feedback
  • 79. I Have the Model… Now What? 1. Specify Performance Requirements 2. Separate Prediction Algorithm From Model Coefficients a. Select or Implement The Prediction Algorithm b. Serialize Your Model Coefficients 3. Develop Automated Tests For Your Model 4. Develop Back-Testing and Now-Testing Infrastructure 5. Challenge Then Trial Model Updates
  • 80. Tips for NLP • Proper preprocessing is VERY important • Know your domain! • Validate your models! • Human judges • Cross-validation
  • 81. Overview • Background • Grammars • Word Representation • Modern NLP • Future Directions • NLP in Industry • Demo
  • 82. Programming Language Identification Exploring code on GitHub Goal is to figure out what language a file uses Potential Methods? Filename, keywords, comments Whitespace, syntax Must be scalable to handle large number of constantly updating repos
  • 83. Existing Model Linguist: Heuristics + Naive Bayes Heuristics can be accurate but require updating and fine-tuning Naive Bayes depends on word frequencies - predictions are linear to vocab size Hard-coded rules do most of the work, leaving the Naive Bayes as a last resort Heavily dependent on file extension classification Selective Classification With file extensions, only 87% of files are classified
  • 84.
  • 85.
  • 88. Appendix • Language is meant to convey meaning, which we have a natural way of encoding • Children learn this very fast! • Hard for computers to learn… • Language is a symbolic signaling system • Example: pen: or ? • Other subtleties: sarcasm, expressive signaling, …
  • 89. What makes NLP difficult? • Language is meant to convey meaning, which we have a natural way of encoding • Children learn this very fast! • Hard for computers to learn… • Language is a symbolic signaling system • Example: pen: or ? • Other subtleties: sarcasm, expressive signaling, …
  • 90. Basics of NLP data preprocessing • Domain specific! • Tokenization • Example: “This is a test that isn’t so simple” • Tokens: “This”, “is”, “a”, “test”, “that”, “is”, “n’t”, “so”, “simple” • Regular expressions • Stemming • Lower-casing • Removing/adding punctuation • Other…
  • 91. SVD-Based Methods • Loop over a massive dataset to accumulate word co-occurrence counts in some matrix X • Perform the SVD on X to get U, S, and V • Use the rows of U as the word embeddings for all words in your dictionary X = ?
  • 92. SVD-Based Methods: Word- Document Matrix • Assumption: related words often appear in the same document • Loop over many documents and every time word i appears in document j, add one to entry Xij • Very high dimensional - let’s try something better
  • 93. We are not quite done… • Need to find suitable U and V matrices! • Two algorithms help us get what we want: • Hierarchical softmax • Negative sampling • These are complicated! • Luckily, someone made software that does this for us • Gensim is a python package that can do all of this complicating word2vec stuff in a few lines of code