SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Introduction to Text Mining and
Analytics
1
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Analytics
Wiki Definition
Text mining, also referred to as text data mining, roughly equivalent
to text analytics, is the process of deriving high-quality information from text
Source of Text Data
2
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Source of Text Data
Organizations today encounter textual data while running their day to day
business. The source of the data could be electronic text, call center logs,
social media, corporate documents, research papers, application forms,
service notes, emails, etc.
Unstructured Data
• “80 % of business-relevant information originates in unstructured form, primarily
text.” (a quote in 2008)
• “Based on the industry’s current estimations, unstructured data will occupy 90%
of the data by volume in the entire digital space over the next decade.” (a quote in
2010)
3
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Text Mining and Analytics
• Text analytics uses algorithms for turning free-form text (unstructured
data) into data that can be analyzed (structured data) by applying
statistical and machine learning methods, as well as Natural Language
Processing (NLP) techniques.
• Once structured data is obtained, the same mining and analytic
techniques can apply.
4
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
techniques can apply.
• So the most significant part of Text Mining/Analytics is how to convert
texts into structured data.
Text Mining Paradigm
5
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Text Mining Process Pipeline
6
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
• Process is essentially a linear pipeline.
• Feedback from the results of Text Mining might
affect earlier preprocessing (to Parsing, or even data
collection)..
Converting Text into Structured Data
• A huge amount of preprocessing is required to convert text.
– Cleaning up ‘dirty’ texts
• Remove mark-up tags from web documents, encrypted symbols such as emoticons/emoji’s,
extraneous strings such as “AHHHHHHHHHHHHHHHHHHHHH”
• Correct misspelled words..
– Tokenization
• Remove punctuations, normalizing upper/lower cases, etc.
– Sentence splitting
7
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
– Sentence splitting
– Identifying multi-word expressions (e.g. “as well as”, “radio wave”) and Named Entities
(e.g. “Allied Waste”, “Super Mario Bros.”)
• Adding other linguistic information
– Parts-of-speech (e.g. noun, verb, adjective, adverb, preposition)
• Filtering non-significant/irrelevant words – to reduce dimensions
– Filtering non-content words using a stop-list (e.g. “the”, “a”, “an”, “and”)
– Combining tokens by stemming/lemmatizing or using synonyms
• Other NLP features/techniques, e.g. n-grams, syntax trees
Text Mining Applications
• Text Clustering • Trend Analysis
8
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Trend for the Term “text mining” from Google Trends
• Spam filtering
Text Mining – Sentiment Analysis
• Sentiment Analysis
The field of sentiment analysis deals
with categorization (or classification)
of opinions expressed in textual
documents
9
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
Sample Tweet:
14 days after #DeMonetisation, PM seeks opinion instead of
addressing the pain & anguish. This is called-Arrogate,
subjugate & dictate!
Two months after RBI Governor changes, #DeMonetisation
happens. Can you imagine what will happen after CJI Thakur
retires on 4 January 2017?
Typical Text Pre-processing Methods
• Given a raw text (in a corpus), we typically pre-process the text by
applying either of the following methods:
1. Part-Of-Speech (POS) tagging – assign a POS to every word in a
sentence in the text
2. Named Entity Recognition (NER) – identify named entities (proper
nouns and some common nouns which are relevant in the domain of
10
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
nouns and some common nouns which are relevant in the domain of
the text)
3. Information Extraction (IE) – identify relations between phrases, and
extract the relevant/significant “information” described in the text
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical class marker to each
word in a sentence (and all sentences in a corpus).
Input: the lead paint is unsafe
Output: the/Det lead/N paint/N is/V unsafe/Adj
2. Named Entity Recognition (NER)
11
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a sentence
e.g. “U.N. official Ekeus heads for Baghdad.”
3. Information Extraction (IE)
• Identify specific pieces of information (data) in an
unstructured or semi-structured text
• Transform unstructured information in a corpus of texts or
web pages into a structured database (or templates)
• Applied to various types of text, e.g.
12
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
– Newspaper
articles
– Scientific
articles
– Web pages
– etc.
Overview
• Tokenization
• Bag of words
• N-Grams
• TF*IDF
13
Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
• TF*IDF
• Topic modeling LDA (Latent Dirichlet allocation)

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...IJCSIS Research Publications
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalrchbeir
 
Ontology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading GroupOntology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading GroupTobias Wunner
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineGan Keng Hoon
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideDr. Haxel Consult
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications sathish sak
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontologyIAEME Publication
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval Tariq Hassan
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 

Was ist angesagt? (20)

Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Ontology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading GroupOntology-based information extraction in the DERI Reading Group
Ontology-based information extraction in the DERI Reading Group
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short Guide
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontology
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Ir 01
Ir   01Ir   01
Ir 01
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 

Ähnlich wie Analysing Demonetisation through Text Mining using Live Twitter Data!

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdfbeshahashenafe20
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptSamuelKetema1
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text AnalyticsSeth Grimes
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadGeohedrick
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingJeffrey Williams
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual GuideIRJET Journal
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - IndexingSean Golliher
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenizationaciijournal
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISaciijournal
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical OverviewDigital Reasoning
 

Ähnlich wie Analysing Demonetisation through Text Mining using Live Twitter Data! (20)

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
 
CRC Final Report
CRC Final ReportCRC Final Report
CRC Final Report
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language Processing
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 

Kürzlich hochgeladen

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Analysing Demonetisation through Text Mining using Live Twitter Data!

  • 1. Introduction to Text Mining and Analytics 1 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Analytics
  • 2. Wiki Definition Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text Source of Text Data 2 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Source of Text Data Organizations today encounter textual data while running their day to day business. The source of the data could be electronic text, call center logs, social media, corporate documents, research papers, application forms, service notes, emails, etc.
  • 3. Unstructured Data • “80 % of business-relevant information originates in unstructured form, primarily text.” (a quote in 2008) • “Based on the industry’s current estimations, unstructured data will occupy 90% of the data by volume in the entire digital space over the next decade.” (a quote in 2010) 3 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
  • 4. Text Mining and Analytics • Text analytics uses algorithms for turning free-form text (unstructured data) into data that can be analyzed (structured data) by applying statistical and machine learning methods, as well as Natural Language Processing (NLP) techniques. • Once structured data is obtained, the same mining and analytic techniques can apply. 4 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) techniques can apply. • So the most significant part of Text Mining/Analytics is how to convert texts into structured data.
  • 5. Text Mining Paradigm 5 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved)
  • 6. Text Mining Process Pipeline 6 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) • Process is essentially a linear pipeline. • Feedback from the results of Text Mining might affect earlier preprocessing (to Parsing, or even data collection)..
  • 7. Converting Text into Structured Data • A huge amount of preprocessing is required to convert text. – Cleaning up ‘dirty’ texts • Remove mark-up tags from web documents, encrypted symbols such as emoticons/emoji’s, extraneous strings such as “AHHHHHHHHHHHHHHHHHHHHH” • Correct misspelled words.. – Tokenization • Remove punctuations, normalizing upper/lower cases, etc. – Sentence splitting 7 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) – Sentence splitting – Identifying multi-word expressions (e.g. “as well as”, “radio wave”) and Named Entities (e.g. “Allied Waste”, “Super Mario Bros.”) • Adding other linguistic information – Parts-of-speech (e.g. noun, verb, adjective, adverb, preposition) • Filtering non-significant/irrelevant words – to reduce dimensions – Filtering non-content words using a stop-list (e.g. “the”, “a”, “an”, “and”) – Combining tokens by stemming/lemmatizing or using synonyms • Other NLP features/techniques, e.g. n-grams, syntax trees
  • 8. Text Mining Applications • Text Clustering • Trend Analysis 8 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Trend for the Term “text mining” from Google Trends • Spam filtering
  • 9. Text Mining – Sentiment Analysis • Sentiment Analysis The field of sentiment analysis deals with categorization (or classification) of opinions expressed in textual documents 9 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) Sample Tweet: 14 days after #DeMonetisation, PM seeks opinion instead of addressing the pain & anguish. This is called-Arrogate, subjugate & dictate! Two months after RBI Governor changes, #DeMonetisation happens. Can you imagine what will happen after CJI Thakur retires on 4 January 2017?
  • 10. Typical Text Pre-processing Methods • Given a raw text (in a corpus), we typically pre-process the text by applying either of the following methods: 1. Part-Of-Speech (POS) tagging – assign a POS to every word in a sentence in the text 2. Named Entity Recognition (NER) – identify named entities (proper nouns and some common nouns which are relevant in the domain of 10 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) nouns and some common nouns which are relevant in the domain of the text) 3. Information Extraction (IE) – identify relations between phrases, and extract the relevant/significant “information” described in the text
  • 11. 1. Part-Of-Speech (POS) Tagging • POS tagging is a process of assigning a POS or lexical class marker to each word in a sentence (and all sentences in a corpus). Input: the lead paint is unsafe Output: the/Det lead/N paint/N is/V unsafe/Adj 2. Named Entity Recognition (NER) 11 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) 2. Named Entity Recognition (NER) • NER is to process a text and identify named entities in a sentence e.g. “U.N. official Ekeus heads for Baghdad.”
  • 12. 3. Information Extraction (IE) • Identify specific pieces of information (data) in an unstructured or semi-structured text • Transform unstructured information in a corpus of texts or web pages into a structured database (or templates) • Applied to various types of text, e.g. 12 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) – Newspaper articles – Scientific articles – Web pages – etc.
  • 13. Overview • Tokenization • Bag of words • N-Grams • TF*IDF 13 Copyright © Ivy Professional School - 2009-10 (All Rights Reserved) • TF*IDF • Topic modeling LDA (Latent Dirichlet allocation)