SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Basic Natural Language Processing
using
Natural (JavaScript/Node) Library
Aniruddha Chakrabarti
AVP and Chief Architect, Digital, Mphasis
@anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
Agenda
• Emergence of Artificial Intelligence, AI First
• What is Natural Language Processing (NLP)
• Natural JavaScript/Node NLP Library
• Tokenization - Word Tokenizer
• Stemming and Lemmatization
• String Distance
• Inflectors
• Phonetics
• N-Grams
• Classifier
• tf-idf
• POS Tagger
• Spell Check
→ Turing Machine
→ Automating manual processes,
tabulating data
→ Reducing manual effort and time
→ IBM System/360 (S/360),
Mainframes, AS/400
→ Computing Power (Moore’s Law)
→ Systems need to be explicitly programmed using
explicit logic and rules. Pre programmed
→ Personal Computers (PCs), Communication
(Networked PCs, Client/Server, Internet, WWW)
→ Automating business processes
→ Mostly structured data
→ Systems that learn from historical data and can make predictions. Not
rule based system.
→ Uses Machine Learning, NLP to analyze unstructured data (text, image,
audio, video)
→ Predictive Analytics, Deep Learning, Neural Nets,
→ OCR, Speech recognition, Text to speech, Face recognition, Video
analysis, …
→ Cognitive Services (pay as you go model) – IBM Watson, Microsoft
Cognitive Services, …
→ Robotics, Internet of Things, Conversational Systems, Wearables, Blur of
physical & virtual
→ Still mostly Weak AI / Narrow AI
Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems)
* From “The Computing Universe” by Tony Hey and Gyuri Papav
→ Strong AI / Full AI
→ Artificial General
Intelligence (AGI)
Tabulating Machines
1960 – 1980
Programmable Systems
1980 - 2010
AI First/AI Everywhere
(Cognitive Systems)
2010 - Current
Real AI ?
?
AI Winter AI Summer
• Artificial Intelligence has emerged as the third era of computing after tabulating machine and
programmable systems.
Gartner Hype Cycle … 2017
• AI technologies like Cognitive Computing, Virtual
Assistants/Chatbot, Conversational AI, Machine
Learning, Deep Learning and Autonomous Vehicles
appear at the peak in Gartner Hype Cycle of Emerging
Technologies, 2017.
• Reinforcement Learning and Artificial General
Intelligence (AGI) has appeared at the starting points of
hype cycle – they are expected to peak in coming years.
Emergence of “AI Everywhere”
Gartner recons AI as one of the
three mega trends. AI
technologies like
Conversational UI, Machine
Learning, Deep Learning and
Cognitive Computing
constitutes “AI Everywhere”
What is Natural Language Processing?
• Field of computer science, artificial intelligence and computational linguistics concerned
with the interactions between computers and human (natural) languages, and, in particular,
concerned with programming computers to fruitfully process large natural language corpora –
Wikipedia
• Broadly categorized into two areas -
▪ Natural Language Understanding (NLU)
▪ Natural Language Generation (NLG)
Natural Language
Processing (NLP)
Natural Language
Understanding (NLU)
Natural Language
Generation (NLG)
Some applications of NLP
• Spell correction (MS Word/ any other editor)
• Search engines (Google, Bing, Yahoo, wolfram alpha)
• Speech engines (Siri, Google Voice, Cortana)
• Personal Voice Assistants (Amazon Alexa, Google Home, …)
• Spam classifiers (All e-mail services)
• News feeds (Google, Yahoo!, and so on)
• Machine translation (Google Translate, and so on)
• Chatbots, Intelligent Virtual Agent/IVA
• IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
NLP Tools & Libraries
• GATE
• Mallet (Java)
• Open NLP – Apache (Java)
• UIMA
• CoreNLP - Stanford CoreNLP toolkit (Java)
• Genism
• Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool
• spaCy (Python) – built on top of NLTK
• TextBlob
• Natural Library (JavaScript/Node)
NLTK
What is Natural
• "Natural" is a general natural language processing library for nodejs.
• Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet,
string similarity, inflections
• At the moment, most of the algorithms are English-specific
• Created by Chris Umbel
• Loosely based on NLTK (Python) NLP Library
• https://github.com/NaturalNode/natural
• http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye
s_naive_metaphone_soundex
Natural library install and setup
• Install using npm (Package manager for Node), use –g switch (for global installation)
• Include the Natural package through require
npm install –g natural
// include the natural library
let Natural = require('natural');
Tokenization
• A word (Token) is the minimal unit that a machine can understand and process.
• Tokenization is the process of splitting the raw string into meaningful tokens
• Raw text cannot be further processed without going through tokenization.
• Complexity of tokenization varies according to the need of the NLP application, and the
complexity of the language itself.
▪ In English it can be as simple as choosing only words and numbers through a regular
expression. But for Chinese and Japanese, it will be a very complex task.
• Two primary types of tokenizers:
▪ Word Tokenizer: Tokenizes raw text to words
▪ Sentence Tokenizer: Tokenizes raw text to sentences
Word Tokenizer
• A word (Token) is the minimal unit that a machine can understand & process
• Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer
tokenizes or splits raw text into words
• Natural comes with multiple tokenizers -
▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and
numeric characters. (Ignores punctuation)
▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into
sequences of alphabetic and non-alphabetic characters.
▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn
Treebank
▪ Regexp Tokenizer: Tokenizes text using regular expression patterns.
▪ Aggressive Tokenizer:
Word Tokenizer (Cont’d)
var sentence = "Hello, how are you? I don't know you!"
var wordTokenizer = new Natural.WordTokenizer();
var tokens = wordTokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
var tokenizer = new Natural.WordPunctTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
var tokenizer = new Natural. TreebankWordTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
console.log(new Natural.AgressiveTokenizer().tokenize(sentence));
// prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
Stemming
• Process of reducing inflected or derived words to their word stem, base or root form.
• Similar to cutting down the branches of a tree to its stem
• More of a crude rule-based process by which we want to club together different variations of
the token – rule based
• Removes –s/es or -ing or -ed
eating, eats, eaten, eat -> eat
stopping, stopped, stops, stop -> stop
ate -> ate (wrong should be eat)
Stemming (Cont’d)
• Different stemming algorithms -
▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968.
Lovins Stemmer is not used currently.
▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and
became the de facto standard algorithm used for English stemming.
▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The
stemmer, although remaining efficient and easily implemented, is known to be very
strong and aggressive. The stemmer utilizes a single table of rules, each of which may
specify the removal or replacement of an ending.
▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of
original Porter Stemmer. Natural does not support Snowball Stemmer
• Lemmatization is a more robust and methodical way of combining grammatical variations to
the root of a word.
▪ Natural does not support any Lemmatization algorithm.
▪ NLTK and other matured NLP libraries support Lemmatization
Stemming – Porter Stemmer and Lancaster Stemmer
var porterStemmer = Natural.PorterStemmer;
console.log(porterStemmer.stem("ate")); // prints at
console.log(porterStemmer.stem("eating")); // prints eat
console.log(porterStemmer.stem("eats")); // prints eat
console.log(porterStemmer.stem("eat")); // prints eat
console.log(porterStemmer.stem("agreement")); // prints agreement
var lancasterStemmer = Natural.LancasterStemmer;
console.log(lancasterStemmer.stem("ate")); // prints at
console.log(lancasterStemmer.stem("eating")); // prints eat
console.log(lancasterStemmer.stem("eats")); // prints eat
console.log(lancasterStemmer.stem("eat")); // prints eat
console.log(lancasterStemmer.stem("agreement")); // prints agr
• Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball
Stemmer.
• Both the stemmers provide a stem method
Stemming – Porter Stemmer (Non English languages)
• Natural supports Porter Stemmer in Non English languages also
• Following languages are supported -
▪ Farsi - PorterStemmerFa
▪ French - PorterStemmerFr
▪ Russian - PorterStemmerRu
▪ Spanish - PorterStemmerEs
▪ Italian - PorterStemmerIt
▪ PorterStemmerNo
▪ Swedish - PorterStemmerSv
▪ PorterStemmerPt
Lemmatization
• More methodical way of converting all the grammatical/inflected forms of the root of the
word.
• Uses context and part of speech to determine the inflected form of the word and applies
different normalization rules for each part of speech to get the root word (lemma)
• Natural NLP library does not support Lemmatization.
Inflector
• Inflectors are used to pluralize or singularize words
• There are different types of Inflectors available in Natural Library
▪ Noun Inflector: pluralize or singularize nouns only
▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural
provides a inflector called PresentVerbInflector which works on Present Tense Verbs
only
▪ Both noun and verb inflector provides singularize and pluralize methods
▪ Number or Count Inflector: Ordinal numbers could be formed from normal number
▪ Provides a single method called nth which returns the ordinal form of any number
passed
Inflector (Cont’d)
// pluralize or singularize nouns only
var nounInflector = new Natural.NounInflector();
console.log(nounInflector.pluralize("Book")); // prints Books
console.log(nounInflector.pluralize("radius")); // prints radii
console.log(nounInflector.singularize("flies")); // prints fly
console.log(nounInflector.singularize("men")); // prints man
var countInflector = Natural.CountInflector;
console.log(countInflector.nth("1")); // prints 1st
console.log(countInflector.nth("2")); // prints 2nd
console.log(countInflector.nth("3")); // prints 3rd
console.log(countInflector.nth("4")); // prints 4th
console.log(countInflector.nth("10")); // prints 10th
var verbInflector = new Natural.PresentVerbInflector();
console.log(verbInflector.singularize("go")); // prints goes
console.log(verbInflector.singularize("run")); // prints runs
console.log(verbInflector.pluralize("becomes")); // prints become
console.log(verbInflector.pluralize("presents")); // prints present
N-Grams
• an n-gram is a contiguous sequence of n items from a given sample of text or speech.
• The items can be phonemes, syllables, letters, words or base pairs according to the
application. The n-grams typically are collected from a text or speech corpus.
• When the items are words, n-grams may also be called shingles
• An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram".
• Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four-
gram", "five-gram", and so on.
Hello how are you Hello how how are are you
bigram
Hello how are you Hello how are how are you
trigram
Hello how are you Hello
unigram
how are you
N-Grams (Cont’d)
var sentence = "Hello how are you";
var ngrams = Natural.NGrams;
console.log(ngrams.bigrams(sentence));
// prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ]
console.log(ngrams.trigrams(sentence));
// prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ]
console.log(ngrams.ngrams(sentence, 1)); // unigram
//prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ]
sentence = "NLTK is a Natural Language Processing Library in Nodejs";
console.log(ngrams.ngrams(sentence, 4)); // four-gram
prints [ [ 'NLTK', 'is', 'a', 'Natural' ],
[ 'is', 'a', 'Natural', 'Language' ],
[ 'a', 'Natural', 'Language', 'Processing' ],
[ 'Natural', 'Language', 'Processing', 'Library' ],
[ 'Language', 'Processing', 'Library', 'in' ],
[ 'Processing', 'Library', 'in', 'Nodejs' ] ]
Phonetics
• A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
• A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather
than spelling.
• Most phonetic algorithms were developed for use with the English language. Consequently,
applying the rules to words in other languages might not give a meaningful result.
• Some of the well known phonetics algorithms are –
▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four-
character strings composed of a single letter followed by three numbers.
▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of
Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six
numeric digits.
▪ Cologne phonetics - Similar to Soundex, but more suitable for German words.
▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English
words, not just names. Metaphone algorithms are basis for many popular spell checkers.
▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to
the same letter. The result is a string that can be pronounced by the reader without decoding.
▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an
encoding and range comparison technique.
▪ Caverphone, created to assist in data matching between late 19th century and early 20th
century electoral rolls, optimized for accents present in parts of New Zealand.
Phonetics Matching (Cont’d)
• Natural supports Phonetic Matching using three algorithms –
▪ SoundEx
▪ Metaphone
▪ DoubleMetaphone
var metaphone = Natural.Metaphone;
var soundex = Natural.SoundEx;
var doubleMetaphone = Natural.DoubleMetaphone;
// using SoundEx for phonetic matching
console.log(soundex.compare("nuremberg", "nuremburg")); // returns true
console.log(soundex.compare("Paris", "Pari")); // returns false
// using Metaphone for phonetic matching
console.log(metaphone.compare("Fool", "Full")); // returns true
console.log(metaphone.compare("Fool", "Failed")); // returns false
// using Double Metaphone for phonetic matching
console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true
console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
String Distance
• String Distance measures how closely two strings match.
• Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String
Distance match
JaroWinkler Distance
• Jaro distance between two words is the minimum number of single-character transpositions
required to change one word into the other.
• It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989,
Matthew A. Jaro).
• Returns a number between 0 and 1 which tells how closely the strings match (0 = no match,
1 = exact match)
// Using JaroWrinkler Distance algorithm
console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match
console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match
console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match
console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
String Distance - Levenstein Distance
• Levenstein Distance between two words is the minimum number of single-character edits
(insertions, deletions or substitutions) required to change one word into the other.
• Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance
in 1965
• Also be referred as edit distance
// Using Levenshtein Distance algorithm
console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0
console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3
console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3
console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6
console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1
B a n g a l o r e B e n g a l u r u
3 character change
N u r e m b e r g N u r e m b u r g
1 character change
tf-idf
• tf–idf or TFIDF is short for term frequency - inverse document frequency
• tf-idf determines how important a word (or words) is to a document relative to a corpus.
• Often used as weighting factor in searches of information retrieval, text mining & user modeling.
• The tf-idf value increases proportionally to the number of times a word appears in the
document and is offset by the frequency of the word in the corpus, which helps to adjust for
the fact that some words appear more frequently in general.
• tfidf method returns the measure of importance of a word
var tfidf = new Natural.TfIdf();
// Documents could be added to tf-idf. Here only a single doc is added, but more could be added
tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it");
// Find out the tf-idf of different words in the document
console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc
console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time
console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc
console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 },
{ term: 'document', tfidf: 0.3068528194400547 },
{ term: 'nlp', tfidf: 0.3068528194400547 },
{ term: 'used', tfidf: 0.3068528194400547 } ]
tf-idf (cont’d)
• Disc files could also be added to tf-idf
• Multiple documents could be added to tf-idf
var tfidf = new Natural.TfIdf();
// Adding files from disc to tfidf
tfidf.addFileSync("C:/Data/Profile.txt");
console.log(tfidf.listTerms(0));
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.');
console.log(tfidf.tfidf("node", 0)); // prints 2
console.log(tfidf.tfidf("NLP", 0)); // prints 1.40
console.log(tfidf.tfidf("ruby", 0)); // prints 0
console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc
console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc
console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc
console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
tf-idf (cont’d)
• tfidf method returns the measure of importance of a word in various documents
• tfidf method accepts the word and a callback
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.’);
// tfidfs method is used to find the importance of the word across multiple documents
tfidf.tfidfs('node', function(ctr, measure){
console.log('tf-idf of node in document #' + ctr + ' is ' + measure);
});
POS (Part of Speech) Tagging
• Process of marking up a word in a text (corpus) as corresponding to a particular part of
speech, based on both its definition and its context—i.e., its relationship with adjacent and
related words in a phrase, sentence, or paragraph.
• Also called grammatical tagging or word-category disambiguation,
POS (Part of Speech) Tagging
• Current state of the art POS tagging algorithms can predict the POS of the given word with
a higher degree of precision (that is approximately 97%). But still lots of research going on
in the area of POS tagging.
No Tag Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
No Tag Description
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb
POS Tagging – Brill POS Tagger
• Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's
transformational algorithm (transformation rules are specified in external files).
• E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms.
// Path where natural library is located
var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger");
// Rules file located in /data/<language> sub folder under natural library
var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt";
// Lexicon file located in /data/<language> sub folder under natural library
var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json";
var defaultCategory = 'N';
var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory);
var rules = new Natural.RuleSet(rulesFilename);
// Any tagger needs lexicon and rules for successful POS tagging of words
// Brill POS Tagger object is created passing lexicon file and rules file location
var tagger = new Natural.BrillPOSTagger(lexicon, rules);
var sentence = "I see the man with the telescope";
var tokenizer = new Natural.WordTokenizer();
// tokenize the sentence to tokens
var tokens = tokenizer.tokenize(sentence);
console.log(tagger.tag(tokens));
[ [ 'I', 'NN' ],
[ 'see', 'VB' ],
[ 'the', 'DT' ],
[ 'man', 'NN' ],
[ 'with', 'IN' ],
[ 'the', 'DT' ],
[ 'telescope', 'NN' ] ]

Weitere ähnliche Inhalte

Was ist angesagt?

Tecnologia da informação infraestrutura de ti
Tecnologia da informação   infraestrutura de tiTecnologia da informação   infraestrutura de ti
Tecnologia da informação infraestrutura de tiVicente Willians Nunes
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
JavaScript Interview Questions and Answers | Full Stack Web Development Train...
JavaScript Interview Questions and Answers | Full Stack Web Development Train...JavaScript Interview Questions and Answers | Full Stack Web Development Train...
JavaScript Interview Questions and Answers | Full Stack Web Development Train...Edureka!
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to javajayc8586
 
Apresentação - Desenvolvimento de software
Apresentação - Desenvolvimento de softwareApresentação - Desenvolvimento de software
Apresentação - Desenvolvimento de softwareCristiano Cunha
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Introdução básica ao JavaScript
Introdução básica ao JavaScriptIntrodução básica ao JavaScript
Introdução básica ao JavaScriptCarlos Eduardo Kadu
 
Linguagem de Programação Python
Linguagem de Programação PythonLinguagem de Programação Python
Linguagem de Programação PythonJunior Sobrenome
 

Was ist angesagt? (20)

Tecnologia da informação infraestrutura de ti
Tecnologia da informação   infraestrutura de tiTecnologia da informação   infraestrutura de ti
Tecnologia da informação infraestrutura de ti
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Compilation v. interpretation
Compilation v. interpretationCompilation v. interpretation
Compilation v. interpretation
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
JavaScript Interview Questions and Answers | Full Stack Web Development Train...
JavaScript Interview Questions and Answers | Full Stack Web Development Train...JavaScript Interview Questions and Answers | Full Stack Web Development Train...
JavaScript Interview Questions and Answers | Full Stack Web Development Train...
 
Software
SoftwareSoftware
Software
 
The MEAN Stack
The MEAN StackThe MEAN Stack
The MEAN Stack
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to java
 
Apresentação - Desenvolvimento de software
Apresentação - Desenvolvimento de softwareApresentação - Desenvolvimento de software
Apresentação - Desenvolvimento de software
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Frappe framework
Frappe framework Frappe framework
Frappe framework
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introdução básica ao JavaScript
Introdução básica ao JavaScriptIntrodução básica ao JavaScript
Introdução básica ao JavaScript
 
Linguagem de Programação Python
Linguagem de Programação PythonLinguagem de Programação Python
Linguagem de Programação Python
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 

Ähnlich wie NLP Basics with Natural JavaScript Library

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesXiang Li
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புBalaSundaraRaman (Sundar)
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic ProgrammerAdam Keys
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholisticoscon2007
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingYaser Zhian
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)Hon Weng Chong
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Designkompalg
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 

Ähnlich wie NLP Basics with Natural JavaScript Library (20)

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game Programming
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
 
Taming Text
Taming TextTaming Text
Taming Text
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Design
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Nltk
NltkNltk
Nltk
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
PCEP Module 1.pptx
PCEP Module 1.pptxPCEP Module 1.pptx
PCEP Module 1.pptx
 

Mehr von Aniruddha Chakrabarti

Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Aniruddha Chakrabarti
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageAniruddha Chakrabarti
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skillsAniruddha Chakrabarti
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsAniruddha Chakrabarti
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Aniruddha Chakrabarti
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Aniruddha Chakrabarti
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsAniruddha Chakrabarti
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTAniruddha Chakrabarti
 

Mehr von Aniruddha Chakrabarti (20)

Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Mphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdfMphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdf
 
Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
 
Third era of computing
Third era of computingThird era of computing
Third era of computing
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) Language
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skills
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflows
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows Platforms
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
 
Groovy Programming Language
Groovy Programming LanguageGroovy Programming Language
Groovy Programming Language
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
 
Level DB - Quick Cheat Sheet
Level DB - Quick Cheat SheetLevel DB - Quick Cheat Sheet
Level DB - Quick Cheat Sheet
 
Lisp
LispLisp
Lisp
 
Overview of CoffeeScript
Overview of CoffeeScriptOverview of CoffeeScript
Overview of CoffeeScript
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
pebble - Building apps on pebble
pebble - Building apps on pebblepebble - Building apps on pebble
pebble - Building apps on pebble
 

Kürzlich hochgeladen

SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 

Kürzlich hochgeladen (20)

SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 

NLP Basics with Natural JavaScript Library

  • 1. Basic Natural Language Processing using Natural (JavaScript/Node) Library Aniruddha Chakrabarti AVP and Chief Architect, Digital, Mphasis @anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
  • 2. Agenda • Emergence of Artificial Intelligence, AI First • What is Natural Language Processing (NLP) • Natural JavaScript/Node NLP Library • Tokenization - Word Tokenizer • Stemming and Lemmatization • String Distance • Inflectors • Phonetics • N-Grams • Classifier • tf-idf • POS Tagger • Spell Check
  • 3. → Turing Machine → Automating manual processes, tabulating data → Reducing manual effort and time → IBM System/360 (S/360), Mainframes, AS/400 → Computing Power (Moore’s Law) → Systems need to be explicitly programmed using explicit logic and rules. Pre programmed → Personal Computers (PCs), Communication (Networked PCs, Client/Server, Internet, WWW) → Automating business processes → Mostly structured data → Systems that learn from historical data and can make predictions. Not rule based system. → Uses Machine Learning, NLP to analyze unstructured data (text, image, audio, video) → Predictive Analytics, Deep Learning, Neural Nets, → OCR, Speech recognition, Text to speech, Face recognition, Video analysis, … → Cognitive Services (pay as you go model) – IBM Watson, Microsoft Cognitive Services, … → Robotics, Internet of Things, Conversational Systems, Wearables, Blur of physical & virtual → Still mostly Weak AI / Narrow AI Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems) * From “The Computing Universe” by Tony Hey and Gyuri Papav → Strong AI / Full AI → Artificial General Intelligence (AGI) Tabulating Machines 1960 – 1980 Programmable Systems 1980 - 2010 AI First/AI Everywhere (Cognitive Systems) 2010 - Current Real AI ? ? AI Winter AI Summer • Artificial Intelligence has emerged as the third era of computing after tabulating machine and programmable systems.
  • 4. Gartner Hype Cycle … 2017 • AI technologies like Cognitive Computing, Virtual Assistants/Chatbot, Conversational AI, Machine Learning, Deep Learning and Autonomous Vehicles appear at the peak in Gartner Hype Cycle of Emerging Technologies, 2017. • Reinforcement Learning and Artificial General Intelligence (AGI) has appeared at the starting points of hype cycle – they are expected to peak in coming years.
  • 5. Emergence of “AI Everywhere” Gartner recons AI as one of the three mega trends. AI technologies like Conversational UI, Machine Learning, Deep Learning and Cognitive Computing constitutes “AI Everywhere”
  • 6. What is Natural Language Processing? • Field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora – Wikipedia • Broadly categorized into two areas - ▪ Natural Language Understanding (NLU) ▪ Natural Language Generation (NLG) Natural Language Processing (NLP) Natural Language Understanding (NLU) Natural Language Generation (NLG)
  • 7. Some applications of NLP • Spell correction (MS Word/ any other editor) • Search engines (Google, Bing, Yahoo, wolfram alpha) • Speech engines (Siri, Google Voice, Cortana) • Personal Voice Assistants (Amazon Alexa, Google Home, …) • Spam classifiers (All e-mail services) • News feeds (Google, Yahoo!, and so on) • Machine translation (Google Translate, and so on) • Chatbots, Intelligent Virtual Agent/IVA • IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
  • 8. NLP Tools & Libraries • GATE • Mallet (Java) • Open NLP – Apache (Java) • UIMA • CoreNLP - Stanford CoreNLP toolkit (Java) • Genism • Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool • spaCy (Python) – built on top of NLTK • TextBlob • Natural Library (JavaScript/Node) NLTK
  • 9. What is Natural • "Natural" is a general natural language processing library for nodejs. • Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, inflections • At the moment, most of the algorithms are English-specific • Created by Chris Umbel • Loosely based on NLTK (Python) NLP Library • https://github.com/NaturalNode/natural • http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye s_naive_metaphone_soundex
  • 10. Natural library install and setup • Install using npm (Package manager for Node), use –g switch (for global installation) • Include the Natural package through require npm install –g natural // include the natural library let Natural = require('natural');
  • 11. Tokenization • A word (Token) is the minimal unit that a machine can understand and process. • Tokenization is the process of splitting the raw string into meaningful tokens • Raw text cannot be further processed without going through tokenization. • Complexity of tokenization varies according to the need of the NLP application, and the complexity of the language itself. ▪ In English it can be as simple as choosing only words and numbers through a regular expression. But for Chinese and Japanese, it will be a very complex task. • Two primary types of tokenizers: ▪ Word Tokenizer: Tokenizes raw text to words ▪ Sentence Tokenizer: Tokenizes raw text to sentences
  • 12. Word Tokenizer • A word (Token) is the minimal unit that a machine can understand & process • Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer tokenizes or splits raw text into words • Natural comes with multiple tokenizers - ▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and numeric characters. (Ignores punctuation) ▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. ▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn Treebank ▪ Regexp Tokenizer: Tokenizes text using regular expression patterns. ▪ Aggressive Tokenizer:
  • 13. Word Tokenizer (Cont’d) var sentence = "Hello, how are you? I don't know you!" var wordTokenizer = new Natural.WordTokenizer(); var tokens = wordTokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ] var tokenizer = new Natural.WordPunctTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] var tokenizer = new Natural. TreebankWordTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] console.log(new Natural.AgressiveTokenizer().tokenize(sentence)); // prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
  • 14. Stemming • Process of reducing inflected or derived words to their word stem, base or root form. • Similar to cutting down the branches of a tree to its stem • More of a crude rule-based process by which we want to club together different variations of the token – rule based • Removes –s/es or -ing or -ed eating, eats, eaten, eat -> eat stopping, stopped, stops, stop -> stop ate -> ate (wrong should be eat)
  • 15. Stemming (Cont’d) • Different stemming algorithms - ▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968. Lovins Stemmer is not used currently. ▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and became the de facto standard algorithm used for English stemming. ▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending. ▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of original Porter Stemmer. Natural does not support Snowball Stemmer • Lemmatization is a more robust and methodical way of combining grammatical variations to the root of a word. ▪ Natural does not support any Lemmatization algorithm. ▪ NLTK and other matured NLP libraries support Lemmatization
  • 16. Stemming – Porter Stemmer and Lancaster Stemmer var porterStemmer = Natural.PorterStemmer; console.log(porterStemmer.stem("ate")); // prints at console.log(porterStemmer.stem("eating")); // prints eat console.log(porterStemmer.stem("eats")); // prints eat console.log(porterStemmer.stem("eat")); // prints eat console.log(porterStemmer.stem("agreement")); // prints agreement var lancasterStemmer = Natural.LancasterStemmer; console.log(lancasterStemmer.stem("ate")); // prints at console.log(lancasterStemmer.stem("eating")); // prints eat console.log(lancasterStemmer.stem("eats")); // prints eat console.log(lancasterStemmer.stem("eat")); // prints eat console.log(lancasterStemmer.stem("agreement")); // prints agr • Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball Stemmer. • Both the stemmers provide a stem method
  • 17. Stemming – Porter Stemmer (Non English languages) • Natural supports Porter Stemmer in Non English languages also • Following languages are supported - ▪ Farsi - PorterStemmerFa ▪ French - PorterStemmerFr ▪ Russian - PorterStemmerRu ▪ Spanish - PorterStemmerEs ▪ Italian - PorterStemmerIt ▪ PorterStemmerNo ▪ Swedish - PorterStemmerSv ▪ PorterStemmerPt
  • 18. Lemmatization • More methodical way of converting all the grammatical/inflected forms of the root of the word. • Uses context and part of speech to determine the inflected form of the word and applies different normalization rules for each part of speech to get the root word (lemma) • Natural NLP library does not support Lemmatization.
  • 19. Inflector • Inflectors are used to pluralize or singularize words • There are different types of Inflectors available in Natural Library ▪ Noun Inflector: pluralize or singularize nouns only ▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural provides a inflector called PresentVerbInflector which works on Present Tense Verbs only ▪ Both noun and verb inflector provides singularize and pluralize methods ▪ Number or Count Inflector: Ordinal numbers could be formed from normal number ▪ Provides a single method called nth which returns the ordinal form of any number passed
  • 20. Inflector (Cont’d) // pluralize or singularize nouns only var nounInflector = new Natural.NounInflector(); console.log(nounInflector.pluralize("Book")); // prints Books console.log(nounInflector.pluralize("radius")); // prints radii console.log(nounInflector.singularize("flies")); // prints fly console.log(nounInflector.singularize("men")); // prints man var countInflector = Natural.CountInflector; console.log(countInflector.nth("1")); // prints 1st console.log(countInflector.nth("2")); // prints 2nd console.log(countInflector.nth("3")); // prints 3rd console.log(countInflector.nth("4")); // prints 4th console.log(countInflector.nth("10")); // prints 10th var verbInflector = new Natural.PresentVerbInflector(); console.log(verbInflector.singularize("go")); // prints goes console.log(verbInflector.singularize("run")); // prints runs console.log(verbInflector.pluralize("becomes")); // prints become console.log(verbInflector.pluralize("presents")); // prints present
  • 21. N-Grams • an n-gram is a contiguous sequence of n items from a given sample of text or speech. • The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. • When the items are words, n-grams may also be called shingles • An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram". • Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four- gram", "five-gram", and so on. Hello how are you Hello how how are are you bigram Hello how are you Hello how are how are you trigram Hello how are you Hello unigram how are you
  • 22. N-Grams (Cont’d) var sentence = "Hello how are you"; var ngrams = Natural.NGrams; console.log(ngrams.bigrams(sentence)); // prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ] console.log(ngrams.trigrams(sentence)); // prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ] console.log(ngrams.ngrams(sentence, 1)); // unigram //prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ] sentence = "NLTK is a Natural Language Processing Library in Nodejs"; console.log(ngrams.ngrams(sentence, 4)); // four-gram prints [ [ 'NLTK', 'is', 'a', 'Natural' ], [ 'is', 'a', 'Natural', 'Language' ], [ 'a', 'Natural', 'Language', 'Processing' ], [ 'Natural', 'Language', 'Processing', 'Library' ], [ 'Language', 'Processing', 'Library', 'in' ], [ 'Processing', 'Library', 'in', 'Nodejs' ] ]
  • 23. Phonetics • A phonetic algorithm is an algorithm for indexing of words by their pronunciation. • A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather than spelling. • Most phonetic algorithms were developed for use with the English language. Consequently, applying the rules to words in other languages might not give a meaningful result. • Some of the well known phonetics algorithms are – ▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four- character strings composed of a single letter followed by three numbers. ▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits. ▪ Cologne phonetics - Similar to Soundex, but more suitable for German words. ▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English words, not just names. Metaphone algorithms are basis for many popular spell checkers. ▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. ▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an encoding and range comparison technique. ▪ Caverphone, created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand.
  • 24. Phonetics Matching (Cont’d) • Natural supports Phonetic Matching using three algorithms – ▪ SoundEx ▪ Metaphone ▪ DoubleMetaphone var metaphone = Natural.Metaphone; var soundex = Natural.SoundEx; var doubleMetaphone = Natural.DoubleMetaphone; // using SoundEx for phonetic matching console.log(soundex.compare("nuremberg", "nuremburg")); // returns true console.log(soundex.compare("Paris", "Pari")); // returns false // using Metaphone for phonetic matching console.log(metaphone.compare("Fool", "Full")); // returns true console.log(metaphone.compare("Fool", "Failed")); // returns false // using Double Metaphone for phonetic matching console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
  • 25. String Distance • String Distance measures how closely two strings match. • Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String Distance match JaroWinkler Distance • Jaro distance between two words is the minimum number of single-character transpositions required to change one word into the other. • It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989, Matthew A. Jaro). • Returns a number between 0 and 1 which tells how closely the strings match (0 = no match, 1 = exact match) // Using JaroWrinkler Distance algorithm console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
  • 26. String Distance - Levenstein Distance • Levenstein Distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. • Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965 • Also be referred as edit distance // Using Levenshtein Distance algorithm console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0 console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3 console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3 console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6 console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1 B a n g a l o r e B e n g a l u r u 3 character change N u r e m b e r g N u r e m b u r g 1 character change
  • 27. tf-idf • tf–idf or TFIDF is short for term frequency - inverse document frequency • tf-idf determines how important a word (or words) is to a document relative to a corpus. • Often used as weighting factor in searches of information retrieval, text mining & user modeling. • The tf-idf value increases proportionally to the number of times a word appears in the document and is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general. • tfidf method returns the measure of importance of a word var tfidf = new Natural.TfIdf(); // Documents could be added to tf-idf. Here only a single doc is added, but more could be added tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it"); // Find out the tf-idf of different words in the document console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 }, { term: 'document', tfidf: 0.3068528194400547 }, { term: 'nlp', tfidf: 0.3068528194400547 }, { term: 'used', tfidf: 0.3068528194400547 } ]
  • 28. tf-idf (cont’d) • Disc files could also be added to tf-idf • Multiple documents could be added to tf-idf var tfidf = new Natural.TfIdf(); // Adding files from disc to tfidf tfidf.addFileSync("C:/Data/Profile.txt"); console.log(tfidf.listTerms(0)); // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.'); console.log(tfidf.tfidf("node", 0)); // prints 2 console.log(tfidf.tfidf("NLP", 0)); // prints 1.40 console.log(tfidf.tfidf("ruby", 0)); // prints 0 console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
  • 29. tf-idf (cont’d) • tfidf method returns the measure of importance of a word in various documents • tfidf method accepts the word and a callback // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.’); // tfidfs method is used to find the importance of the word across multiple documents tfidf.tfidfs('node', function(ctr, measure){ console.log('tf-idf of node in document #' + ctr + ' is ' + measure); });
  • 30. POS (Part of Speech) Tagging • Process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. • Also called grammatical tagging or word-category disambiguation,
  • 31. POS (Part of Speech) Tagging • Current state of the art POS tagging algorithms can predict the POS of the given word with a higher degree of precision (that is approximately 97%). But still lots of research going on in the area of POS tagging. No Tag Description 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular 15. NNPS Proper noun, plural 16. PDT Predeterminer 17. POS Possessive ending 18. PRP Personal pronoun No Tag Description 19. PRP$ Possessive pronoun 20. RB Adverb 21. RBR Adverb, comparative 22. RBS Adverb, superlative 23. RP Particle 24. SYM Symbol 25. TO to 26. UH Interjection 27. VB Verb, base form 28. VBD Verb, past tense 29. VBG Verb, gerund or present participle 30. VBN Verb, past participle 31. VBP Verb, non-3rd person singular present 32. VBZ Verb, 3rd person singular present 33. WDT Wh-determiner 34. WP Wh-pronoun 35. WP$ Possessive wh-pronoun 36. WRB Wh-adverb
  • 32. POS Tagging – Brill POS Tagger • Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's transformational algorithm (transformation rules are specified in external files). • E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms. // Path where natural library is located var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger"); // Rules file located in /data/<language> sub folder under natural library var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt"; // Lexicon file located in /data/<language> sub folder under natural library var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json"; var defaultCategory = 'N'; var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory); var rules = new Natural.RuleSet(rulesFilename); // Any tagger needs lexicon and rules for successful POS tagging of words // Brill POS Tagger object is created passing lexicon file and rules file location var tagger = new Natural.BrillPOSTagger(lexicon, rules); var sentence = "I see the man with the telescope"; var tokenizer = new Natural.WordTokenizer(); // tokenize the sentence to tokens var tokens = tokenizer.tokenize(sentence); console.log(tagger.tag(tokens)); [ [ 'I', 'NN' ], [ 'see', 'VB' ], [ 'the', 'DT' ], [ 'man', 'NN' ], [ 'with', 'IN' ], [ 'the', 'DT' ], [ 'telescope', 'NN' ] ]