SlideShare ist ein Scribd-Unternehmen logo
1 von 4
NLP?
Natural language processing is the process of building computational
models for understanding natural language. It studies the problems
of automated generation and understanding of natural human
languages. NLP includes natural-language-generation systems that
convert information from computer databases into normal human
language and natural-language-understanding systems that convert
samples of human language into more formal representations that
are easier for computer programs to manipulate.
NLP involve multiple disciplines. Including artificial intelligence
techniques, multivariate, logical inference, statistics, linguistics and
any other technique that can be used to process, generate or
interpret language with computers.
In order to understand this field it is fundamental to know and
understand the meaning of the terms used in this field. This words
can reffer either to the processes used in this field or to definitions of
different kind of information. This information definitions are:
1- repositories of knowledge containing linguistic information, real
facts and different kinds of relations that can be found in
language.
2- specifications describing kinds of content and how to obtain
them from texts which provide information about different
aspects of texts.
Machine learning and NLP
Machine learning is a subfield of artificial intelligence (AI) concerned
with algorithms that allow computers to learn.
We can view NLP as “an extension of what machine learning” or “a
special kind of machine learning”. Both need to build models using
algorithms and datasets in order to be able to process the new data
with these already built models.
Machine-learning can provide natural language processing a range of
alternative Learning algorithms as well as additional general
approaches and methodologies.
NLP also introduces new learning frameworks and techniques such
as: information retrieval and extraction, through speech recognition
to syntax, semantics and language understanding related tasks. It
also presents the theoretical paradigms: learning theoretic,
probabilistic and information theoretic, and the relations among
them, along with the main algorithmic techniques developed within
these and in key natural language applications.
The 2 NLP approaches
1. Statistical NLP: comprises all quantitative approaches to
automated language processing, including probabilistic
modeling, information theory, and linear algebra.[6]
The
technology for statistical NLP comes mainly from machine
learning and data mining, both of which are fields of artificial
intelligence that involve learning from data.
2. Linguistic oriented: based on large repositories that contain
information about texts, for example a list of synonims, a
taxonomy, definition of the gramatic rules of languages, etc..
Mayor task in NLP
 Automatic summarization: Produce a readable summary of a
chunk of text. Often used to provide summaries of text of a known
type, such as articles in the financial section of a newspaper.
 Machine translation: Automatically translate text from one human
language to another. This is one of the most difficult problems,
and is a member of a class of problems colloquially termed "AI-
complete", i.e. requiring all of the different types of knowledge
that humans possess (grammar, semantics, facts about the real
world, etc.) in order to solve properly.
 Part-of-speech tagging: Given a sentence, determine the part of
speech for each word. Many words, especially common ones, can
serve as multiple parts of speech. For example, "book" can be a
noun ("the book on the table") or verb ("to book a flight"); "set"
can be a noun, verb or adjective; and "out" can be any of at least
five different parts of speech. Note that some languages have
more such ambiguity than others. Languages with little inflectional
morphology, such as English are particularly prone to such
ambiguity. Chinese is prone to such ambiguity because it is a tonal
language during verbalization. Such inflection is not readily
conveyed via the entities employed within the orthography to
convey intended meaning.
 Parsing: Determine the parse tree (grammatical analysis) of a
given sentence. The grammar for natural
languages is ambiguous and typical sentences have multiple
possible analyses. In fact, perhaps surprisingly, for a typical
sentence there may be thousands of potential parses (most of
which will seem completely nonsensical to a human).
 Sentiment analysis: Extract subjective information usually from a
set of documents, often using online reviews to determine
"polarity" about specific objects. It is especially useful for
identifying trends of public opinion in the social media, for the
purpose of marketing.
 Topic segmentation and recognition: Given a chunk of text,
separate it into segments each of which is devoted to a topic, and
identify the topic of the segment.
Part of NLP specific vocabulary and it's meaning.
Linguistics is the scientific and philosophical study of language,
encompassing a number of sub-fields. At the core of theoretical
linguistics is the study of language structure (grammar) and the
study of meaning (semantics). The first of these encompasses
morphology (the formation and composition of words) and syntax
(the rules that determine how words combine into phrases and
sentences).
A controlled vocabulary is a list of terms that have been
enumerated explicitly. This list is controlled by and is available from a
controlled vocabulary registration authority. All terms in a controlled
vocabulary should have an unambiguous, non-redundant definition.
Named entity recognition is a subtask of information extraction
that seeks to locate and classify atomic elements in text into
predefined categories such as the names of persons, organizations,
locations, expressions of times, quantities, monetary values,
percentages, etc.
A taxonomy is a collection of controlled vocabulary terms
organized into a hierarchical structure (tree shaped). Each term in
the taxonomy is in one or more parent-child relationships. The child
kind of thing has by definition the same constraints as the father type
ones plus one or more additional constraints. For example, car is a
child of vehicle. So any car is also a vehicle, but not every vehicle is a
car. There are also specific kind of taxonomies like an “enterprise
taxonomy” which contains terms related only to this specific
field. Taxonomies are seen as less broad
than ontologies because ontologies include logic inference and
allow a larger variety of relation types.
An ontology is a formal representation of a set of concepts within a
domain and the relationships between those concepts. It is used to
reason about the properties of that domain, and may be used to
define the domain. They are a form of knowledge representation.
Part-of-speech (POS) tagging is a process whereby tokens are
sequentially labeled with syntactic labels, such as "finite verb" or
"gerund" or "subordinating conjunction".
Morphology is the study of the internal structure of words.
Lexeme is the distinction between these two senses of "word" is
arguably the most important one in morphology. The first sense of
"word," the one in which dog and dogs are "the same word," this is
called lexeme. The second one is called word-form. We thus say
that dog and dogs have a common Lemma. a Stemmer is used to
transform words to its Lemma (also called root). ttjere are different
forms of the same lexeme. There is a form of a word that is chosen
conventionally to represent the canonical form of
a Lemma. A Lexicon is the collection of all the lexemes of a
language.
Grammar is the field of linguistics that covers the rules governing
the use of any given spoken languages. It mainly
includes morphology andsyntax, but it can be complemented with
other linguistic fields.
Syntax is the study of the principles and rules for constructing
sentences in natural languages; the term syntax is also used to refer
directly to the rules and principles that govern the sentence
structure. Semantics is basically the study of the meaning of signs.
These studies can be performed at word level, sentence level,
paragraph level, and even larger units of discourse levels..
Corpus is a large and structured set of texts used to do statistical
analysis, text-mining, validation of linguistic rules, calculate
document similarities, etc..
Slow but well organize video introduction:
http://www.youtube.com/watch?v=bDPULOFFlaI

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 

Was ist angesagt? (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Language models
Language modelsLanguage models
Language models
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Machine translation
Machine translationMachine translation
Machine translation
 

Andere mochten auch

MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's Tools
Jim O'Regan
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
Wei-Ting Kuo
 

Andere mochten auch (20)

Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
 
NLP_session-1
NLP_session-1NLP_session-1
NLP_session-1
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLTK
NLTKNLTK
NLTK
 
Nltk
NltkNltk
Nltk
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Attention and consciousness
Attention and consciousness Attention and consciousness
Attention and consciousness
 
MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's Tools
 
Attention & Consciousness
Attention & ConsciousnessAttention & Consciousness
Attention & Consciousness
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
Parallel Port
Parallel PortParallel Port
Parallel Port
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Top down approach
Top down approachTop down approach
Top down approach
 
Cognitive development presentation
Cognitive development presentationCognitive development presentation
Cognitive development presentation
 

Ähnlich wie Natural Language Processing

Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
theboysaiml
 
Linguistics Theories MPB 2014 Progressive-edu.com
Linguistics Theories MPB 2014  Progressive-edu.comLinguistics Theories MPB 2014  Progressive-edu.com
Linguistics Theories MPB 2014 Progressive-edu.com
Hono Joe
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
Hina Honey
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
Hina Honey
 

Ähnlich wie Natural Language Processing (20)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
 
NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
 
Linguistics Theories MPB 2014 Progressive-edu.com
Linguistics Theories MPB 2014  Progressive-edu.comLinguistics Theories MPB 2014  Progressive-edu.com
Linguistics Theories MPB 2014 Progressive-edu.com
 
Linguistics Theories MPB 2014 Progressive-edu.com
Linguistics Theories MPB 2014  Progressive-edu.comLinguistics Theories MPB 2014  Progressive-edu.com
Linguistics Theories MPB 2014 Progressive-edu.com
 
Lexicology as a science
Lexicology as a scienceLexicology as a science
Lexicology as a science
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 
Emotion Detection from Text
Emotion Detection from TextEmotion Detection from Text
Emotion Detection from Text
 
NLP
NLPNLP
NLP
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
Scopes of linguistic description 1
Scopes of linguistic description 1Scopes of linguistic description 1
Scopes of linguistic description 1
 
Linguistics
LinguisticsLinguistics
Linguistics
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
 
Introduction to linguistics lec 1
Introduction to linguistics lec 1Introduction to linguistics lec 1
Introduction to linguistics lec 1
 

Natural Language Processing

  • 1. NLP? Natural language processing is the process of building computational models for understanding natural language. It studies the problems of automated generation and understanding of natural human languages. NLP includes natural-language-generation systems that convert information from computer databases into normal human language and natural-language-understanding systems that convert samples of human language into more formal representations that are easier for computer programs to manipulate. NLP involve multiple disciplines. Including artificial intelligence techniques, multivariate, logical inference, statistics, linguistics and any other technique that can be used to process, generate or interpret language with computers. In order to understand this field it is fundamental to know and understand the meaning of the terms used in this field. This words can reffer either to the processes used in this field or to definitions of different kind of information. This information definitions are: 1- repositories of knowledge containing linguistic information, real facts and different kinds of relations that can be found in language. 2- specifications describing kinds of content and how to obtain them from texts which provide information about different aspects of texts. Machine learning and NLP Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. We can view NLP as “an extension of what machine learning” or “a special kind of machine learning”. Both need to build models using algorithms and datasets in order to be able to process the new data with these already built models. Machine-learning can provide natural language processing a range of alternative Learning algorithms as well as additional general approaches and methodologies. NLP also introduces new learning frameworks and techniques such as: information retrieval and extraction, through speech recognition to syntax, semantics and language understanding related tasks. It also presents the theoretical paradigms: learning theoretic,
  • 2. probabilistic and information theoretic, and the relations among them, along with the main algorithmic techniques developed within these and in key natural language applications. The 2 NLP approaches 1. Statistical NLP: comprises all quantitative approaches to automated language processing, including probabilistic modeling, information theory, and linear algebra.[6] The technology for statistical NLP comes mainly from machine learning and data mining, both of which are fields of artificial intelligence that involve learning from data. 2. Linguistic oriented: based on large repositories that contain information about texts, for example a list of synonims, a taxonomy, definition of the gramatic rules of languages, etc.. Mayor task in NLP  Automatic summarization: Produce a readable summary of a chunk of text. Often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper.  Machine translation: Automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "AI- complete", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve properly.  Part-of-speech tagging: Given a sentence, determine the part of speech for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a noun ("the book on the table") or verb ("to book a flight"); "set" can be a noun, verb or adjective; and "out" can be any of at least five different parts of speech. Note that some languages have more such ambiguity than others. Languages with little inflectional morphology, such as English are particularly prone to such ambiguity. Chinese is prone to such ambiguity because it is a tonal language during verbalization. Such inflection is not readily conveyed via the entities employed within the orthography to convey intended meaning.
  • 3.  Parsing: Determine the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).  Sentiment analysis: Extract subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing.  Topic segmentation and recognition: Given a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment. Part of NLP specific vocabulary and it's meaning. Linguistics is the scientific and philosophical study of language, encompassing a number of sub-fields. At the core of theoretical linguistics is the study of language structure (grammar) and the study of meaning (semantics). The first of these encompasses morphology (the formation and composition of words) and syntax (the rules that determine how words combine into phrases and sentences). A controlled vocabulary is a list of terms that have been enumerated explicitly. This list is controlled by and is available from a controlled vocabulary registration authority. All terms in a controlled vocabulary should have an unambiguous, non-redundant definition. Named entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. A taxonomy is a collection of controlled vocabulary terms organized into a hierarchical structure (tree shaped). Each term in the taxonomy is in one or more parent-child relationships. The child kind of thing has by definition the same constraints as the father type ones plus one or more additional constraints. For example, car is a child of vehicle. So any car is also a vehicle, but not every vehicle is a car. There are also specific kind of taxonomies like an “enterprise taxonomy” which contains terms related only to this specific
  • 4. field. Taxonomies are seen as less broad than ontologies because ontologies include logic inference and allow a larger variety of relation types. An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain. They are a form of knowledge representation. Part-of-speech (POS) tagging is a process whereby tokens are sequentially labeled with syntactic labels, such as "finite verb" or "gerund" or "subordinating conjunction". Morphology is the study of the internal structure of words. Lexeme is the distinction between these two senses of "word" is arguably the most important one in morphology. The first sense of "word," the one in which dog and dogs are "the same word," this is called lexeme. The second one is called word-form. We thus say that dog and dogs have a common Lemma. a Stemmer is used to transform words to its Lemma (also called root). ttjere are different forms of the same lexeme. There is a form of a word that is chosen conventionally to represent the canonical form of a Lemma. A Lexicon is the collection of all the lexemes of a language. Grammar is the field of linguistics that covers the rules governing the use of any given spoken languages. It mainly includes morphology andsyntax, but it can be complemented with other linguistic fields. Syntax is the study of the principles and rules for constructing sentences in natural languages; the term syntax is also used to refer directly to the rules and principles that govern the sentence structure. Semantics is basically the study of the meaning of signs. These studies can be performed at word level, sentence level, paragraph level, and even larger units of discourse levels.. Corpus is a large and structured set of texts used to do statistical analysis, text-mining, validation of linguistic rules, calculate document similarities, etc.. Slow but well organize video introduction: http://www.youtube.com/watch?v=bDPULOFFlaI