SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Natural language
processing (NLP)
introduction
!
Robert Lujo
About me
• software
• professionally 18 g.
• python >= 2.0, django >= 0.96
• freelancer
• … (linkedin)
NLP is …
NLP
Natural language processing (NLP)
a field of computer science … concerned with the
interactions
between computers and human (natural)
languages.
!
https://en.wikipedia.org/wiki/Natural_language_processing
NLP
“between computers and human (natural)
languages”
1. computer -> human language
2. human language -> computer
NLP trend
• Internet is huge and easily accessible resource
of information
• BUT - information is mainly unstructured
• usually simple scraping (scrapy) is sufficient, but
sometimes it is not
• NLP solves or helps in converting free text
(unstructured information) to structural form
NLP goals
some examples
NLP goals - group 1
• cleanup, tokenization
• stemming
• lemmatization
• part-of-speach tagging
• query expansion
• sentence segmentation
NLP goals - group 2
• information extraction
• named entity recognition (NER)
• sentiment analysis
• word sense disambiguation
• text similarity
NLP goals - group 3
• machine translation
• automatic summarisation
• natural language generation
• question answering
NLP goals - group 4
• optical character recognition (OCR)
• speech processing
• speech recognition
• text-to-speech
NLP theory
Word, term, feature
• word <> term
• document or text chunk is an unit / entity / object!
• terms are features of the document!
• each term has properties:
• normalized form -> term.baseform + term.transformation
• position(s) in the document -> term.position(s)
• frequency -> term.frequency
Text, document, chunk
• what is document?
• text segmentation
• hard problem
• usually we consider whole document as one
unit (entity)
Terms, features
• converting words -> terms
• term frequency is usually the most important feature!
• how to get the list of terms with frequencies:
• preprocessing - e.g. remove all but words, remove stopwords,
tokenization (regexp)
• word normalization
dog ~ dogs zeleno ~ najzelenijih
• .tolower(), regexp, stemming, lemmatization
• much harder for inflectional languages, e.g. Croatian, see text-hr :)
Term weight - TF-IDF
• term frequency – inverse document frequency
• variables:
• t - term,
• d - one document
• D - all documents
• TF - is term frequency in a document function - i.e. measure on how
much information the term brings in one document
• IDF - is inverse document frequency of the term function - i.e.
inversed measure on how much information the term brings in all
documents (corpus)
Terms position, syntax
• sometimes term position is important
• neighbours, collocation, phrase extraction, NER
• from regexp to parsers
• syntax trees
• complex, cpu intensive
Terms position, syntax
In their public lectures they have even claimed that the only
evidence that Khufu built the pyramid is the graffiti found in the five
chambers.
Bag of words
Bag of words
• simplified and effective way to process
documents by:
• disregarding grammar (term.baseform?)
• disregarding word order (term.position)
• keeping only multiplicity (term.frequency)
Bag of words
• sparse matrix
• numbers can be:
• binary - 0/1
• simple term frequency
• weight - e.g. TF-IDF
Bag of words
• very simple -> very fast
• frequently used:
• in index servers
• in database for simple full-text-search
operations
• for processing of large datasets
NLP techniques
Machine learning
• one of the Machine learning application is NLP
• after text is converted to entities with features,
machine learning techniques can be applied
Machine learning
• ML algorithm families categorisation
• supervised - classification (distinct), regression (numerical)
• unsupervised - clustering
• A lot of various methods/algorithm families, statistical,
probabilistic, …
decision trees, neural networks / deep learning,
support vector machines, bayesian networks,
markov models, genetic algorithms
Machine learning
Usual NLP methods
• Naive Bayes
• Markov models
• SVM
• Neural networks / Deep learning
NLP libraries
!
mainly python
Basic string manipulation
• keep it simple and stupid
.lower(), .strip(), .split(), .join(),
iterators, …
• regexp
• not only match, but transformation, extraction (1),
backreferences etc.
• re.options, re.multiline, repl can be function:
def repl(m): …
re.sub(“pattern”, repl, “string”)
NLTK
http://www.nltk.org/
the biggest, the most popular, the most comprehensive,
free book:
!
!
!
Scikit-Learn
http://scikit-learn.org/stable/index.html
machine learning in python
!
!
!
spaCy
http://honnibal.github.io/spaCy/
new kid on the block - 2015-01
text processing in Python and Cython
“… industrial-strength NLP …
… the fastest NLP software …”
Stanford NLP
• http://nlp.stanford.edu/software/index.shtml
• statistical NLP, deep learning NLP, and rule-
based NLP tools for major computational
linguistics problems
• famous
• Java
Misc …
• data analysis libraries - numpy, pandas, matplotlib,
shapely …
• parsers - BLIPP, pyparsing, parserator
• MonkeyLearn service …
• Java, C/C++
• effective memory representation, permanent storage etc.
• lot of free resources - books, reddit, blogs, etc.
tutto finito …
Thank you for your patience
Q/A?
!
robert.lujo@gmail.com
@trebor74hr

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 

Was ist angesagt? (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
NLP
NLPNLP
NLP
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Language models
Language modelsLanguage models
Language models
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
NLP
NLPNLP
NLP
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 

Ähnlich wie Natural language processing (NLP) introduction

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPRrubaa Panchendrarajan
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash CourseCharlie Greenbacker
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingGeeks Anonymes
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Nikola Milosevic
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
introtonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfintrotonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfAdityaMishra178868
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 

Ähnlich wie Natural language processing (NLP) introduction (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash Course
 
NLTK
NLTKNLTK
NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
introtonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdfintrotonlp-190218095523 (1).pdf
introtonlp-190218095523 (1).pdf
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Taming Text
Taming TextTaming Text
Taming Text
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 

Mehr von Robert Lujo

ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
Django dev-env-my-way
Django dev-env-my-wayDjango dev-env-my-way
Django dev-env-my-wayRobert Lujo
 
Object.__class__.__dict__ - python object model and friends - with examples
Object.__class__.__dict__ - python object model and friends - with examplesObject.__class__.__dict__ - python object model and friends - with examples
Object.__class__.__dict__ - python object model and friends - with examplesRobert Lujo
 
Funkcija, objekt, python
Funkcija, objekt, pythonFunkcija, objekt, python
Funkcija, objekt, pythonRobert Lujo
 
Python - na uzlazu ili silazu?
Python - na uzlazu ili silazu?Python - na uzlazu ili silazu?
Python - na uzlazu ili silazu?Robert Lujo
 
Razvoj softvera: crno/bijeli svijet?
Razvoj softvera: crno/bijeli svijet?Razvoj softvera: crno/bijeli svijet?
Razvoj softvera: crno/bijeli svijet?Robert Lujo
 

Mehr von Robert Lujo (6)

ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Django dev-env-my-way
Django dev-env-my-wayDjango dev-env-my-way
Django dev-env-my-way
 
Object.__class__.__dict__ - python object model and friends - with examples
Object.__class__.__dict__ - python object model and friends - with examplesObject.__class__.__dict__ - python object model and friends - with examples
Object.__class__.__dict__ - python object model and friends - with examples
 
Funkcija, objekt, python
Funkcija, objekt, pythonFunkcija, objekt, python
Funkcija, objekt, python
 
Python - na uzlazu ili silazu?
Python - na uzlazu ili silazu?Python - na uzlazu ili silazu?
Python - na uzlazu ili silazu?
 
Razvoj softvera: crno/bijeli svijet?
Razvoj softvera: crno/bijeli svijet?Razvoj softvera: crno/bijeli svijet?
Razvoj softvera: crno/bijeli svijet?
 

Kürzlich hochgeladen

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 

Kürzlich hochgeladen (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 

Natural language processing (NLP) introduction

  • 2. About me • software • professionally 18 g. • python >= 2.0, django >= 0.96 • freelancer • … (linkedin)
  • 4. NLP Natural language processing (NLP) a field of computer science … concerned with the interactions between computers and human (natural) languages. ! https://en.wikipedia.org/wiki/Natural_language_processing
  • 5. NLP “between computers and human (natural) languages” 1. computer -> human language 2. human language -> computer
  • 6. NLP trend • Internet is huge and easily accessible resource of information • BUT - information is mainly unstructured • usually simple scraping (scrapy) is sufficient, but sometimes it is not • NLP solves or helps in converting free text (unstructured information) to structural form
  • 8. NLP goals - group 1 • cleanup, tokenization • stemming • lemmatization • part-of-speach tagging • query expansion • sentence segmentation
  • 9. NLP goals - group 2 • information extraction • named entity recognition (NER) • sentiment analysis • word sense disambiguation • text similarity
  • 10. NLP goals - group 3 • machine translation • automatic summarisation • natural language generation • question answering
  • 11. NLP goals - group 4 • optical character recognition (OCR) • speech processing • speech recognition • text-to-speech
  • 13. Word, term, feature • word <> term • document or text chunk is an unit / entity / object! • terms are features of the document! • each term has properties: • normalized form -> term.baseform + term.transformation • position(s) in the document -> term.position(s) • frequency -> term.frequency
  • 14. Text, document, chunk • what is document? • text segmentation • hard problem • usually we consider whole document as one unit (entity)
  • 15. Terms, features • converting words -> terms • term frequency is usually the most important feature! • how to get the list of terms with frequencies: • preprocessing - e.g. remove all but words, remove stopwords, tokenization (regexp) • word normalization dog ~ dogs zeleno ~ najzelenijih • .tolower(), regexp, stemming, lemmatization • much harder for inflectional languages, e.g. Croatian, see text-hr :)
  • 16. Term weight - TF-IDF • term frequency – inverse document frequency • variables: • t - term, • d - one document • D - all documents • TF - is term frequency in a document function - i.e. measure on how much information the term brings in one document • IDF - is inverse document frequency of the term function - i.e. inversed measure on how much information the term brings in all documents (corpus)
  • 17. Terms position, syntax • sometimes term position is important • neighbours, collocation, phrase extraction, NER • from regexp to parsers • syntax trees • complex, cpu intensive
  • 18. Terms position, syntax In their public lectures they have even claimed that the only evidence that Khufu built the pyramid is the graffiti found in the five chambers.
  • 20. Bag of words • simplified and effective way to process documents by: • disregarding grammar (term.baseform?) • disregarding word order (term.position) • keeping only multiplicity (term.frequency)
  • 21. Bag of words • sparse matrix • numbers can be: • binary - 0/1 • simple term frequency • weight - e.g. TF-IDF
  • 22. Bag of words • very simple -> very fast • frequently used: • in index servers • in database for simple full-text-search operations • for processing of large datasets
  • 24. Machine learning • one of the Machine learning application is NLP • after text is converted to entities with features, machine learning techniques can be applied
  • 25. Machine learning • ML algorithm families categorisation • supervised - classification (distinct), regression (numerical) • unsupervised - clustering • A lot of various methods/algorithm families, statistical, probabilistic, … decision trees, neural networks / deep learning, support vector machines, bayesian networks, markov models, genetic algorithms
  • 27. Usual NLP methods • Naive Bayes • Markov models • SVM • Neural networks / Deep learning
  • 29. Basic string manipulation • keep it simple and stupid .lower(), .strip(), .split(), .join(), iterators, … • regexp • not only match, but transformation, extraction (1), backreferences etc. • re.options, re.multiline, repl can be function: def repl(m): … re.sub(“pattern”, repl, “string”)
  • 30. NLTK http://www.nltk.org/ the biggest, the most popular, the most comprehensive, free book: ! ! !
  • 32. spaCy http://honnibal.github.io/spaCy/ new kid on the block - 2015-01 text processing in Python and Cython “… industrial-strength NLP … … the fastest NLP software …”
  • 33. Stanford NLP • http://nlp.stanford.edu/software/index.shtml • statistical NLP, deep learning NLP, and rule- based NLP tools for major computational linguistics problems • famous • Java
  • 34. Misc … • data analysis libraries - numpy, pandas, matplotlib, shapely … • parsers - BLIPP, pyparsing, parserator • MonkeyLearn service … • Java, C/C++ • effective memory representation, permanent storage etc. • lot of free resources - books, reddit, blogs, etc.
  • 35. tutto finito … Thank you for your patience Q/A? ! robert.lujo@gmail.com @trebor74hr