text mining, data mining, machine learning, unstructured data, big data, database, data warehouse, text mining (industry), research (industry), text analysis, text, text analytics, unstructured, data science, structured data, advanced analytics, what is data mining, data mining lecture, data mining techniques, information, learning from data, computre technolog, technology, data process, data mining tutorial,
3. Outline
Introduction
Data Mining vs Text Mining
Text Mining Process
Text Mining Applications
Challenges in Text Mining
Conclusion
3
4. Introduction
What is Text Mining? And Why text mining?
# Text mining is the analysis of data contained in natural
language text.
Massive amount of new information being created World’s data
doubles every 18 months (Jacques Vallee Ph.D)
80-90% of all data is held in various unstructured formats
Useful information can be derived from this unstructured data
4
6. How Text Mining Differs from Data
Mining
Data Mining
Identify data sets
Select features
Prepare data
Analyze Distribution
Text Mining
Identify documents
Extract features
Select features by algorithm
Prepare data
Analyze distribution
6
7. Text mining process 7
Text preprocessing
Syntactic/Semantic text
analysis
Features Generation
Bag of words
Features Selection
Simple counting
Statistics
Text/Data Mining
Classification- Supervised
learning
Clustering- Unsupervised
learning
Analyzing results
Mapping/Visualization
Result interpretation
8. Text mining applications
Call Center Software.
Anti-Spam.
Market Intelligence.
Mining in web .
Web log analysis
8
9. Challenges in Text Mining
Information is in unstructured textual form and it’s
in Natural Language (NL).
Not readily accessible to be used by computers.
Dealing with huge collections of documents.
Require Skillful person to choose which documents
that will treat , and analysis the output .
Require more time.
Cost , 50,000$ just to software.
9
10. Conclusion
Finally, most refer to that the field of text mining are still in the research
phase and still its applications limited operation at the present time
But the possibilities that can be provided, which helps to understand the
huge amounts of text and extract the core of which information is
important and useful prospects in many areas .
10