This document provides an overview of natural language processing (NLP) and discusses several NLP applications. It introduces NLP and how it helps computers understand human language through examples like Apple's Siri and Google Now. It then summarizes popular NLP toolkits and describes applications including text summarization, information extraction, sentiment analysis, and dialog systems. The document concludes by discussing NLP system development, testing, and evaluation.
2. Why NLP?
lWe have to adopt to how computer wants data
land we still adopt to the way computer gives back
information.
lNLP is helping us to make computer understand one of the
most powerful interface to HUMANS : language.
lApple Siri , Google Now are cutting edge examples of how
NLP helps computer to fit humans.
lMore details : http://www.slideshare.net/yourfrienddhruv/apps-with-ears-and-eyes
3. Google Now vs. Siri vs. Cortana
https://www.stonetemple.com/great-knowledge-box-showdown/
5. Cutting edge NLP!
https://news.ycombinator.com/item?id=8428418
AI Websites That Design
Themselves
thegrid.i
o
6. NLP in today's session
In this session we will focus more on how we
can deal with written language in software
products.
7. NLP for text analysis
lKnowledge is fundamental requirement for any
problem solving.
lAn intelligent decision making system needs 3
Major things.
lA) Lots of relevant knowledge
lB) A way to represent that knowledge
corresponding to current problem/question at
hand
lC) A way to represent the answer in human
language.
8. General Architecture of NLP systems
lBasic systems
l Tokenization-> [lemmatization] -> tagging ->
chunking -> domain mapping
l NLP systems requires per-created domain
specific corpora (dictionary+rule set handcrafted
by humans)
l Details: http://www.nltk.org/book/ch05.html
9. General Architecture of NLP systems
lAdvance Systems
http://nlp.stanford.edu/software/patternslearning.shtml
10. Relationship to Machine Learning
lNLP
lAlgo and tooling are targeted to convert Text/Data to
Values
lML
lAlgo and tooling are targeted to consume Values and
produce meaningful Values/Vectors
11. Few popular NLP toolkits
lPython
lhttp://www.nltk.org
lhttp://scikit-learn.org/
lhttps://textblob.readthedocs.org
lJava
lhttp://nlp.stanford.edu/software/index.shtml
lhttps://gate.ac.uk/overview.html
lhttps://opennlp.apache.org/
l R
lhttp://cran.r-project.
org/web/views/NaturalLanguageProcessing.ht
ml
12. Interesting applications
lCoverd in this session
l1) Information summarization
l2) Information extraction
l3) Sentiment Analysis
l4) Dialog based systems
13. 1) Information summarization
lCreates summary of big text.
l http://summly.com/
lYou can create highly personalized summary of same
content per user
lhttp://automatedinsights.com/wordsmith/
lRace is on between 'plagiarism detection' and 'automatic
paraphrasing'
l http://copyscape.com/
l https://oaps.eu/project/overview/
l http://plagcontrol.com
lHandy code :
l Python and related : https://github.com/miso-belica/sumy
l Java/Scala : https://github.com/MojoJolo/textteaser
lBasics:
14. 2) Information extraction
lNamed Entity Recognition
lCommon entity types include ORGANIZATION,
PERSON, LOCATION, DATE, TIME, MONEY, and
GPE (geo-political entity).
lRelationship extraction
lMainly between NERs
lhttp://www.cruxbot.com/
lHandy code :
lhttp://www.nltk.org/book/ch07.html
lBasics:
l Find interesting pair of words, and note adjoining
words to know relationship between them.
15. 2.1) Information Retrieval
lLarge text needs to be search based on key words
lTraditional RDMS indexing don't work.
lUsing Full text search toolkits, which are good practical
example of NLP implementation.
lHandy Code:
lSolar:Java
lPostgresql:DB
lhttp://blog.lostpropertyhq.com/postgres-full-text-search-is-good-
enough/
l Basics:
lWhile storing large text, remove non value added words (e.g
verbs) and index only root of words.
16. 3) Sentiment Analysis
lTo understand overall meaning/tone of text.
le.g. Neutral vs. Polar. Positive vs. Negative.
lDemo
lhttp://text-processing.com/demo/sentiment/
lhttp://nlp.stanford.edu:8080/sentiment/rntnDemo.html
lUse:
lFinding twitter tread is positive or negative?
lFinding overall review for a product is positive or
negative?
lBasics:
l Have to pick most interesting phrases and co-relate
their meaning.
l Correlate/Group things with similar meaning
17. 4) Dialog based systems
lUnderstand input given in natural language.
lGoogle search, Siri, Google Now
lBuilding interactive chat bots to handle customer support.
lDetails:http://www.nltk.org/book/ch10.html
lHandy code:
l We can convert a question to SQL Query!
lBasics:
lHave English grammar mapped to another grammar for input parsing
& vise-a-verse
18. Development & Testing/Verifying of NLP systems
l1) Understand Gold Set, Training Set , Test Set
l2) Seen vs Unseen Data
l3) Accuracy : Precision & Recall.
l4) Confusion Matrices
19. Session Summary
l1) NLP + ML capabilities are foundation for
intelligent systems working with / on consumer
data.
l2) Domain knowledge is the key differentiators
and MAJOR cost factor
l3) NLP system development requires different mid
set, as its not creation but its evolution of software
system.
l4) Lots and Lots of academic / research reading is
must.
20. What Next? Q&A? Are you sure?
lI have an Idea which might require NLP
lGo reach out more people:
l@nikunjness , @yourfrienddhruv
lI am want to know how to develop such systems
lI think I want to research more possibilities!
lRead this : http://www.nltk.org/book/ch01.html
lYes, It's python.
lI think its too complex.
lYou are not alone.