SlideShare ist ein Scribd-Unternehmen logo
1 von 101
Downloaden Sie, um offline zu lesen
Getting Started with NLTK
    An Introduction to NLTK


           Sreejith S
     srssreejith@gmail.com
          @tweet2sree

     FOSSMeet 2011,NIC Calicut


       06 February 2011




        Sreejith S   Getting Started with NLTK
Just a word about me !!




     Working in Natural Language Processing (NLP), Machine Learning,
     Text Mining
     Active member of ilugcbe , http://ilugcbe.techstud.org
     Works for 365Media Pvt. Ltd. Coimbatore India.
     @tweet2sree , srssreejith@gmail.com




                             Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing




                           Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP




                              Sreejith S   Getting Started with NLTK
Introduction - NLP



     Natural Language Processing
     NLP is an inter-disciplinary subject
         Computer Science
         Linguistics
         Statistics etc...
     NLP is a sub field of Artificial Intelligence
     NLP - Any kind of computer manipulation of natural language.
     It is a rapidly developing field of study
     Everyday applications of NLP
         Handwriting recognition,Machine translation,Question-answering
         systems,Spell checkers,Grammer checkers etc...




                              Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible




                             Sreejith S   Getting Started with NLTK
Natural Language Toolkit (NLTK)



     A collection of Python programs, modules, data set and tutorial to
     support research and development in Natural Language Processing
     (NLP)
     Written by Steven Bird, Edvard Loper and Ewan Klien
     NLTK is
         Free and Open source
         Easy to use
         Modular
         Well documented
         Simple and extensible
     http://www.nltk.org




                             Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs




                           Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP




                            Sreejith S   Getting Started with NLTK
What You Will Learn




     How simple programs can help you manipulate and analyze language
     data, and how to write these programs
     How key concepts from NLP and linguistics are used to describe and
     analyze language
     How data structures and algorithms are used in NLP
     How language data is stored in standard formats, and how data can
     be used to evaluate the performance of NLP techniques




                            Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .




                             Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()




                              Sreejith S   Getting Started with NLTK
Installation of NLTK


     Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system
     Install Python Tkinter package
     Install Numpy, Matplotlib, Prover9, MaltParse and MegaM
     Download NLTK and Install it
         If you are installing NLTK from source Download
         http://nltk.googlecode.com/files/nltk-2.0b9.zip
         Unzip it , It will create nltk-2.0b9 .
         Open terminal and cd in to this folder, Be super user , python
         setup.py install
     To install data
         Start python interpreter
         >>> import nltk
         >>> nltk.download()
     Now you are ready to play with NLTK !!!



                              Sreejith S   Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                Functionality

  nltk.corpus                 Courpus




                 Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
NLTK Modules


  NLTK Modules                       Functionality

  nltk.corpus                        Courpus
  nltk.tokenize,nltk.stem            Tokenizers,stemmers
  nltk.collocations                  t-test,chi-squared,mutual-info
  nltk.tag                           n-gram,backoff,Brill,HMM,TnT
  nltk.classify,nltk.cluster         Decision tree,Naive bayes,K-means
  nltk.chunk                         Regex,n-gram,named entity
  nltk.parsing                       Parsing
  nltk.sem,nltk.interence            Semantic interpretation
  nltk.metrics                       Evaluation metrics
  nltk.probability                   Probability & Estimation
  nltk.app,nltk.chat                 Applications




                        Sreejith S    Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
             >>> from nltk.book import *
             >>> text1.concordance("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information




                              Sreejith S   Getting Started with NLTK
Let us start the game

     To access data for working out the example in the book
         Start python interpreter
     Some basic work outs from the book
         Concordance
              >>> from nltk.book import *
              >>> text1.concordance("monstrous")
         Similar
              >>> text1.similar("monstrous")
         Dispersion plot - Positional information
              >>> text4.dispersion_plot(["citizens",
         "democracy", "freedom", "duties", "America"])

             >>> text4.dispersion_plot(["and",
         "to", "of", "with", "the"])
         What is it !!! Why ???



                              Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
             >>> text3.generate()
         Counting Vocabulary
             >>> len(text3)




                           Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text




                               Sreejith S   Getting Started with NLTK
Continued...


     Some basic work outs from the book
         Generate
              >>> text3.generate()
         Counting Vocabulary
              >>> len(text3)
         List of distinct words ,sorted in dictionary order.
              >>> sorted(set(text3))
         Count occurrence of a particular word in a text
              >>> text3.count("and")

               What percentage of text it is taken by a specific word
               >>> 100 * text3.count("and") / len(text3)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram




                       Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
      >>> text4.collocations()




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)




                               Sreejith S   Getting Started with NLTK
Collocation & Bigram

  Collocation
  A collocation is a sequence of words that occur together unusually often
  e.g :- red wine , strong tea
  But strong computer is not a collocation
       >>> text4.collocations()

  Bigrams
  List of word pairs
       >>> text = "sreejith is talking about NLTK"
       >>> wordlist = text.split()
       >>> bigrams(wordlist)
  what will happen if i do like this
       >>> bigrams(text)


                                Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()




                           Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus




                            Sreejith S   Getting Started with NLTK
Work with our own data

     Populate our own corpora with NLTK and analyse it
         >>> from nltk.corpus import
             PlaintextCorpusReader as ptr
         >>> corpus = ’/home/developer/Desktop/Sreejith’
         >>> wordlist = ptr(corpus,’.*’)
         >>> wordlist.fileids()
     Let us try to find it out how to count number of characters, words
     and sentences in the corpus
         >>> for fid in wordlist.fileids():
                print len(wordlist.raw(fid))
         >>> for fid in wordlist.fileids():
                print len(wordlist.words(fid))

         >>> for fid in wordlist.fileids():
                print len(wordlist.sents(fid))



                            Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution




                              Sreejith S   Getting Started with NLTK
Continued...


     Ploting conditional frquency distribution
          >>>   text = "sreejith is talking about NLTK"
          >>>   words = text.split()
          >>>   big = bigrams(words)
          >>>   gd = nltk.ConditionalFreqDist(big)
          >>>   gd.plot()
     Tabulate CFD
          >>> gd.tabulate()
     Plot frequency distribution
          >>> fdist = FreqDist(text1)
          >>> fdist.plot(50,cumulative=True)




                              Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form




                               Sreejith S   Getting Started with NLTK
Normalizing Text



  Stemming
  Stemming is the process for reducing inflected (or sometimes derived)
  words to their stem, base or root form , generally a written word form
       >>> porter = nltk.PorterStemmer()
       >>> word = ’running’
       >>> porter.stem(word)

       >>> lancaster = nltk.LancasterStemmer()
       >>> lancaster.stem(tok[2])




                               Sreejith S   Getting Started with NLTK
Normalizing Text




                   Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary




                             Sreejith S   Getting Started with NLTK
Normalizing Text




  Lemmatization
  Stemming + make sure that the resulting form is a known word in a
  dictionary
      >>> wnl = nltk.WordNetLemmatizer()
      >>> wnl.lemmatize(word)




                             Sreejith S   Getting Started with NLTK
POS Tagging




              Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging




                               Sreejith S   Getting Started with NLTK
POS Tagging




  POS Tagging
  The process of classifying words into their parts-of-speech and labeling
  them accordingly is known as part-of-speech tagging, POS tagging
       >>> text = nltk.word_tokenize("we are attending
                  FOSS meet at NIC calicut")
       >>> nltk.pos_tag(text)




                               Sreejith S   Getting Started with NLTK
Parsing




          Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree




                              Sreejith S   Getting Started with NLTK
Parsing



  Sentence Parsing
  Analyzing sentence structures and create a Parse Tree

      >>> sentence = [("the", "DT"), ("little", "JJ"),
          ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),
          ("at", "IN"), ("the", "DT"), ("cat", "NN")]
      >>> grammar = "NP: {<DT>?<JJ>*<NN>}"
      >>> cp = nltk.RegexpParser(grammar)
      >>> result = cp.parse(sentence)
      >>> print result
      >>> result.draw()




                              Sreejith S   Getting Started with NLTK
Machine Translation




                      Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run




                              Sreejith S   Getting Started with NLTK
Machine Translation



  Babelizer Shell
  Translating a sentence from its source langauge to a specified language.
  NLTK provides babelize shell
      >>> babelize_shell()
      Babel> hello how are you?
      Babel> german
      Babel> run


      Just try Google Translator, Yahoo babelfish




                              Sreejith S   Getting Started with NLTK
What u can do??




     Contribute to NLTK
     GSOC
     NLP Training
     Real time research




                          Sreejith S   Getting Started with NLTK
Reference




     Steven Bird, Edvard Loper and Ewan Klien
     Natural Language Processing with Python
     Jacob Perkins
     Python Text Processing with NLTK2.0 Cookbook
     http://www.nltk.org




                            Sreejith S   Getting Started with NLTK
Questions




            Sreejith S   Getting Started with NLTK
And finally...




                                                         Sreejith.S



                Sreejith S   Getting Started with NLTK

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

Was ist angesagt? (20)

Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 

Ähnlich wie Introduction to NLTK

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 

Ähnlich wie Introduction to NLTK (20)

Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
nlp ppt.pdf
nlp ppt.pdfnlp ppt.pdf
nlp ppt.pdf
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Technical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with PythonTechnical Development Workshop - Text Analytics with Python
Technical Development Workshop - Text Analytics with Python
 
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute...
 
Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1Python an-intro youtube-livestream-day1
Python an-intro youtube-livestream-day1
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Getting started with Linux and Python by Caffe
Getting started with Linux and Python by CaffeGetting started with Linux and Python by Caffe
Getting started with Linux and Python by Caffe
 
Nlp final
Nlp finalNlp final
Nlp final
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Eclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan HerrmannEclipse Day India 2015 - Keynote - Stephan Herrmann
Eclipse Day India 2015 - Keynote - Stephan Herrmann
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Introduction to NLTK

  • 1. Getting Started with NLTK An Introduction to NLTK Sreejith S srssreejith@gmail.com @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreejith S Getting Started with NLTK
  • 2. Just a word about me !! Working in Natural Language Processing (NLP), Machine Learning, Text Mining Active member of ilugcbe , http://ilugcbe.techstud.org Works for 365Media Pvt. Ltd. Coimbatore India. @tweet2sree , srssreejith@gmail.com Sreejith S Getting Started with NLTK
  • 3. Introduction - NLP Natural Language Processing Sreejith S Getting Started with NLTK
  • 4. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Sreejith S Getting Started with NLTK
  • 5. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Sreejith S Getting Started with NLTK
  • 6. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Sreejith S Getting Started with NLTK
  • 7. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... Sreejith S Getting Started with NLTK
  • 8. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence Sreejith S Getting Started with NLTK
  • 9. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. Sreejith S Getting Started with NLTK
  • 10. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Sreejith S Getting Started with NLTK
  • 11. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Sreejith S Getting Started with NLTK
  • 12. Introduction - NLP Natural Language Processing NLP is an inter-disciplinary subject Computer Science Linguistics Statistics etc... NLP is a sub field of Artificial Intelligence NLP - Any kind of computer manipulation of natural language. It is a rapidly developing field of study Everyday applications of NLP Handwriting recognition,Machine translation,Question-answering systems,Spell checkers,Grammer checkers etc... Sreejith S Getting Started with NLTK
  • 13. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Sreejith S Getting Started with NLTK
  • 14. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien Sreejith S Getting Started with NLTK
  • 15. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Sreejith S Getting Started with NLTK
  • 16. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Sreejith S Getting Started with NLTK
  • 17. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Sreejith S Getting Started with NLTK
  • 18. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Sreejith S Getting Started with NLTK
  • 19. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Sreejith S Getting Started with NLTK
  • 20. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible Sreejith S Getting Started with NLTK
  • 21. Natural Language Toolkit (NLTK) A collection of Python programs, modules, data set and tutorial to support research and development in Natural Language Processing (NLP) Written by Steven Bird, Edvard Loper and Ewan Klien NLTK is Free and Open source Easy to use Modular Well documented Simple and extensible http://www.nltk.org Sreejith S Getting Started with NLTK
  • 22. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs Sreejith S Getting Started with NLTK
  • 23. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language Sreejith S Getting Started with NLTK
  • 24. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP Sreejith S Getting Started with NLTK
  • 25. What You Will Learn How simple programs can help you manipulate and analyze language data, and how to write these programs How key concepts from NLP and linguistics are used to describe and analyze language How data structures and algorithms are used in NLP How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques Sreejith S Getting Started with NLTK
  • 26. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Sreejith S Getting Started with NLTK
  • 27. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Sreejith S Getting Started with NLTK
  • 28. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Sreejith S Getting Started with NLTK
  • 29. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it Sreejith S Getting Started with NLTK
  • 30. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Sreejith S Getting Started with NLTK
  • 31. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Sreejith S Getting Started with NLTK
  • 32. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install Sreejith S Getting Started with NLTK
  • 33. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Sreejith S Getting Started with NLTK
  • 34. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Sreejith S Getting Started with NLTK
  • 35. Installation of NLTK Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system Install Python Tkinter package Install Numpy, Matplotlib, Prover9, MaltParse and MegaM Download NLTK and Install it If you are installing NLTK from source Download http://nltk.googlecode.com/files/nltk-2.0b9.zip Unzip it , It will create nltk-2.0b9 . Open terminal and cd in to this folder, Be super user , python setup.py install To install data Start python interpreter >>> import nltk >>> nltk.download() Now you are ready to play with NLTK !!! Sreejith S Getting Started with NLTK
  • 36. NLTK Modules NLTK Modules Functionality Sreejith S Getting Started with NLTK
  • 37. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus Sreejith S Getting Started with NLTK
  • 38. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers Sreejith S Getting Started with NLTK
  • 39. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info Sreejith S Getting Started with NLTK
  • 40. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT Sreejith S Getting Started with NLTK
  • 41. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means Sreejith S Getting Started with NLTK
  • 42. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity Sreejith S Getting Started with NLTK
  • 43. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing Sreejith S Getting Started with NLTK
  • 44. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation Sreejith S Getting Started with NLTK
  • 45. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics Sreejith S Getting Started with NLTK
  • 46. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation Sreejith S Getting Started with NLTK
  • 47. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 48. NLTK Modules NLTK Modules Functionality nltk.corpus Courpus nltk.tokenize,nltk.stem Tokenizers,stemmers nltk.collocations t-test,chi-squared,mutual-info nltk.tag n-gram,backoff,Brill,HMM,TnT nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means nltk.chunk Regex,n-gram,named entity nltk.parsing Parsing nltk.sem,nltk.interence Semantic interpretation nltk.metrics Evaluation metrics nltk.probability Probability & Estimation nltk.app,nltk.chat Applications Sreejith S Getting Started with NLTK
  • 49. Let us start the game To access data for working out the example in the book Start python interpreter Sreejith S Getting Started with NLTK
  • 50. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 51. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance Sreejith S Getting Started with NLTK
  • 52. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Sreejith S Getting Started with NLTK
  • 53. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar Sreejith S Getting Started with NLTK
  • 54. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Sreejith S Getting Started with NLTK
  • 55. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information Sreejith S Getting Started with NLTK
  • 56. Let us start the game To access data for working out the example in the book Start python interpreter Some basic work outs from the book Concordance >>> from nltk.book import * >>> text1.concordance("monstrous") Similar >>> text1.similar("monstrous") Dispersion plot - Positional information >>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"]) >>> text4.dispersion_plot(["and", "to", "of", "with", "the"]) What is it !!! Why ??? Sreejith S Getting Started with NLTK
  • 57. Continued... Some basic work outs from the book Sreejith S Getting Started with NLTK
  • 58. Continued... Some basic work outs from the book Generate Sreejith S Getting Started with NLTK
  • 59. Continued... Some basic work outs from the book Generate >>> text3.generate() Sreejith S Getting Started with NLTK
  • 60. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary Sreejith S Getting Started with NLTK
  • 61. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) Sreejith S Getting Started with NLTK
  • 62. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. Sreejith S Getting Started with NLTK
  • 63. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Sreejith S Getting Started with NLTK
  • 64. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text Sreejith S Getting Started with NLTK
  • 65. Continued... Some basic work outs from the book Generate >>> text3.generate() Counting Vocabulary >>> len(text3) List of distinct words ,sorted in dictionary order. >>> sorted(set(text3)) Count occurrence of a particular word in a text >>> text3.count("and") What percentage of text it is taken by a specific word >>> 100 * text3.count("and") / len(text3) Sreejith S Getting Started with NLTK
  • 66. Collocation & Bigram Sreejith S Getting Started with NLTK
  • 67. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation Sreejith S Getting Started with NLTK
  • 68. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Sreejith S Getting Started with NLTK
  • 69. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs Sreejith S Getting Started with NLTK
  • 70. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) Sreejith S Getting Started with NLTK
  • 71. Collocation & Bigram Collocation A collocation is a sequence of words that occur together unusually often e.g :- red wine , strong tea But strong computer is not a collocation >>> text4.collocations() Bigrams List of word pairs >>> text = "sreejith is talking about NLTK" >>> wordlist = text.split() >>> bigrams(wordlist) what will happen if i do like this >>> bigrams(text) Sreejith S Getting Started with NLTK
  • 72. Work with our own data Populate our own corpora with NLTK and analyse it Sreejith S Getting Started with NLTK
  • 73. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Sreejith S Getting Started with NLTK
  • 74. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus Sreejith S Getting Started with NLTK
  • 75. Work with our own data Populate our own corpora with NLTK and analyse it >>> from nltk.corpus import PlaintextCorpusReader as ptr >>> corpus = ’/home/developer/Desktop/Sreejith’ >>> wordlist = ptr(corpus,’.*’) >>> wordlist.fileids() Let us try to find it out how to count number of characters, words and sentences in the corpus >>> for fid in wordlist.fileids(): print len(wordlist.raw(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.words(fid)) >>> for fid in wordlist.fileids(): print len(wordlist.sents(fid)) Sreejith S Getting Started with NLTK
  • 76. Continued... Ploting conditional frquency distribution Sreejith S Getting Started with NLTK
  • 77. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Sreejith S Getting Started with NLTK
  • 78. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD Sreejith S Getting Started with NLTK
  • 79. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Sreejith S Getting Started with NLTK
  • 80. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution Sreejith S Getting Started with NLTK
  • 81. Continued... Ploting conditional frquency distribution >>> text = "sreejith is talking about NLTK" >>> words = text.split() >>> big = bigrams(words) >>> gd = nltk.ConditionalFreqDist(big) >>> gd.plot() Tabulate CFD >>> gd.tabulate() Plot frequency distribution >>> fdist = FreqDist(text1) >>> fdist.plot(50,cumulative=True) Sreejith S Getting Started with NLTK
  • 82. Normalizing Text Sreejith S Getting Started with NLTK
  • 83. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form Sreejith S Getting Started with NLTK
  • 84. Normalizing Text Stemming Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form , generally a written word form >>> porter = nltk.PorterStemmer() >>> word = ’running’ >>> porter.stem(word) >>> lancaster = nltk.LancasterStemmer() >>> lancaster.stem(tok[2]) Sreejith S Getting Started with NLTK
  • 85. Normalizing Text Sreejith S Getting Started with NLTK
  • 86. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary Sreejith S Getting Started with NLTK
  • 87. Normalizing Text Lemmatization Stemming + make sure that the resulting form is a known word in a dictionary >>> wnl = nltk.WordNetLemmatizer() >>> wnl.lemmatize(word) Sreejith S Getting Started with NLTK
  • 88. POS Tagging Sreejith S Getting Started with NLTK
  • 89. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging Sreejith S Getting Started with NLTK
  • 90. POS Tagging POS Tagging The process of classifying words into their parts-of-speech and labeling them accordingly is known as part-of-speech tagging, POS tagging >>> text = nltk.word_tokenize("we are attending FOSS meet at NIC calicut") >>> nltk.pos_tag(text) Sreejith S Getting Started with NLTK
  • 91. Parsing Sreejith S Getting Started with NLTK
  • 92. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree Sreejith S Getting Started with NLTK
  • 93. Parsing Sentence Parsing Analyzing sentence structures and create a Parse Tree >>> sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN>}" >>> cp = nltk.RegexpParser(grammar) >>> result = cp.parse(sentence) >>> print result >>> result.draw() Sreejith S Getting Started with NLTK
  • 94. Machine Translation Sreejith S Getting Started with NLTK
  • 95. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell Sreejith S Getting Started with NLTK
  • 96. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Sreejith S Getting Started with NLTK
  • 97. Machine Translation Babelizer Shell Translating a sentence from its source langauge to a specified language. NLTK provides babelize shell >>> babelize_shell() Babel> hello how are you? Babel> german Babel> run Just try Google Translator, Yahoo babelfish Sreejith S Getting Started with NLTK
  • 98. What u can do?? Contribute to NLTK GSOC NLP Training Real time research Sreejith S Getting Started with NLTK
  • 99. Reference Steven Bird, Edvard Loper and Ewan Klien Natural Language Processing with Python Jacob Perkins Python Text Processing with NLTK2.0 Cookbook http://www.nltk.org Sreejith S Getting Started with NLTK
  • 100. Questions Sreejith S Getting Started with NLTK
  • 101. And finally... Sreejith.S Sreejith S Getting Started with NLTK