3. What is TextMining?
Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving
high-quality information from text… (https://en.wikipedia.org/wiki/Text_mining)
● Information Extraction
● Sentiment Analysis
● Keyword Extraction
● Classification
● Clustering
● Natural Language Processing
● Information Retrieval
4. What is ElasticSearch?
● Scalable Search- and Analytics engine
● Based on Apache Lucene
● ~ NoSQL database
● Very popular and powerful
● Easy to use (Rest/HTTP)
10. 4. Classification
a) Training data:
● You need an index with minimum 2 fields
○ content - analyzed String (text)
○ category - not_analyzed String (keyword)
11. 4. Classification
a) Classification:
● MoreLikeThis Query:
○ Document = like_text
○ Aggregate categories of top 10 hits with scores
● Eval20News dataset:
○ Recall: 1.0 - Precision: 0.71
○ Recall: 0.12 - Precision: 0.92