SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Medical Persona Classification in Social Media
Nikhil Pattisapu1
, Manish Gupta1,2
, Ponnurangam
Kumaraguru3
, Vasudeva Varma1
1IIIT Hyderabad
2Microsoft India
3IIIT Delhi
Advances in Social Network Analysis and Mining 2017
ASONAM 2017 1 / 30
Overview
Motivation
Problem Definition
Related Work
Dataset
Approach
Evaluation Metrics
Experiments
Results
Analysis and Conclusion
Future Work
ASONAM 2017 2 / 30
Motivation
What is Medical Persona?
User groups and content providers of Web 2.0 applications in
healthcare. Some examples -
Patient
Caretaker
Consultant
Journalist
Pharmacist
Researcher
Other
ASONAM 2017 3 / 30
Motivation
Pharmaceutical firms use Medical social media for Drug
marketing and pharmacovigilance.
Figure: Sample post from drugs.com describing a patient’s experiences
with the drug Keppra.
ASONAM 2017 4 / 30
Motivation
Use cases
Few use cases for identifying medical persona are mentioned below.
To gather information about drug usage, adverse events,
benefits and side effects from patients.
To find out the kind of informational assistance sought by
caretakers and put such information readily available.
To identify key opinion leaders in a drug or disease area.
To find out if a doctor has patients who can take part in a
clinical trial.
ASONAM 2017 5 / 30
Motivation
Use cases
To gather information on conversations between pharmacists
and others to identify drug dosage, interactions and
therapeutic effects.
To acquire or collaborate on technologies invented by
researchers that can be a part of the drug pipeline.
To gather information about journalists’ survey on quality of
life of patients.
ASONAM 2017 6 / 30
Problem Definition
Given a social media post, identify the medical personae associated
with it.
We pose this as multi-label text classification problem, where our
label set is {Patient, Caretaker, Consultant, Journalist, Pharmacist,
Researcher, Other}
There are two primary reasons for setting this as a multi-label
classification task (as opposed to single-label)
There might be posts involving conversations between
multiple personae. For example, a blog describing
patient-consultant conversation.
A post might be of ambiguous nature and hence can
potentially be mapped to more than one label by a human
annotator.
ASONAM 2017 7 / 30
Related Work
This problem is primarily related to two problems, which are
thoroughly studied in literature
Authorship Attribution - The task of determining the author
of a particular document
Automatic Genre Identification (AGI) - The task of classifying
documents based on genres (which includes their form,
structure, functional trait, communicative purpose, targeted
audience and narrative style) rather than the content, topics
or subjects that the documents span.
ASONAM 2017 8 / 30
Related Work
State-of-the-art Methods
For both, authorship attribution and AGI, supervised algorithms
based on extensive feature engineering have been proposed. The
top features include
Word n-grams
Character n-grams
Common words
Function words
Part-of-speech tags
Document statistics (e.g. document length)
HTML tags.
Stylistic features
Acronyms
Hashtag and reply mentions.
ASONAM 2017 9 / 30
Related Work
Why can’t existing methods be trivially adapted?
Different features need to be explored for medical domain.
As opposed to most methods proposed in literature, our task
is of closed-set multi-label type.
Each persona has several users and will itself contain
heterogeneity.
ASONAM 2017 10 / 30
Dataset
Blog / Tweet Search API
Noise Filtering &
Deduplication
Human
Annotation
Query
Blogs / Tweets Labeled
Blogs / Tweets
Figure: Dataset Collection
Our dataset consists of both blogs as well as tweets.
Examples of queries include drug names - minocycline, qvar,
gilenya
Whenever using only drugs as queries resulted in a lot of
irrelevant content, drug-disease pairs (e.g. acne minocycline)
were used as queries.
We used 50 queries and retrieved 50 blogs and 30 tweets per
query.
Noisy posts, retweets were removed.
ASONAM 2017 11 / 30
Dataset
Figure: Dataset Statistics
1581 blogs and 1025 tweets were annotated
The inter-annotator agreement between 4 annotators was
found to be 0.708 for blogs and 0.70 for tweets.
The label cardinality of blogs and tweets was 1.18 and 1.24
respectively.
The maximum label cardinality of a blog was 2 and that of a
tweet was 3.
ASONAM 2017 12 / 30
Approach
Overview
We first transform the multi-label task into one or more single
label-task using
Binary label transformation
Label powerset transformation
We then use the following approaches to solve this task
N-gram approach
Feature Engineering
Averaged Word Vectors
CNN-LSTM
ASONAM 2017 13 / 30
Approach
Label transformation method
Binary Relevance Method
We train an individual classifier for each label.
Given an unseen sample, the combined model then predicts all
labels for this sample for which the respective classifiers
predict a positive result.
Label Powerset Method
We train one binary classifier for every label combination
attested in the training set
For an unseen example, prediction is done using a voting
scheme.
ASONAM 2017 14 / 30
Approach
N-gram approach (Baseline)
Each document is represented as a TF-IDF vector over the
entire vocabulary.
An SVM is trained to classify the document into one or more
of the pre-defined personae.
Both Word n-grams and character n-grams are used.
Averaged word Vectors
document vector(di ) =
wi
j
word embedding(wi
j )
len(pi )
(1)
ASONAM 2017 15 / 30
Approach
Word Embedding Details
ID Training
Source
Training
Algo-
rithm
#Dim #Entries Domain
1 Medical
Tweets
(ADR)
Word2Vec 200 1344629 Medical
2 Twitter GloVe 200 1193515 Generic
3 Web crawl 1 GloVe 300 2196018 Generic
4 Web crawl 2 GloVe 300 1917495 Generic
5 PubMed,
PMC,
Wikipedia
Word2Vec 200 5443656 Medical
Table: Pre-trained Word Embedding Details
ASONAM 2017 16 / 30
Approach
Feature Engineering
For this task, we manually engineered a total of 89 features,
distributed in 6 feature types.
Document Level features (4)
Captures generic features of a post
Examples - Number of sentences, average sentence length,
average word length
Pharmacist blogs are lengthier than Patient blogs.
POS features (33)
Capture the distribution of different Parts-of-Speech in the
document.
Example - Number of Adjectives
A Consultant is 1.6 times more likely to use adjectives than a
journalist.
ASONAM 2017 17 / 30
Approach
Feature Engineering
List lookup features (7)
Include the average frequency of terms which occur in the
document as well as in a particular list.
Example - List of abusive words.
The terms MD, Dr., MBBS, FRCS, consultation fee, were
found to be more frequent in consultant blogs than others.
Syntactic features (7)
Capture the presence or absence of various classes of terms.
Example - date, person, location, organization, time, money,
and percentage amounts.
Researcher blogs contain more percentage mentions than
others.
ASONAM 2017 18 / 30
Approach
Feature Engineering
Semantic features (35)
Consist of a lot of medical domain specific features
Examples - number of disease mentions, drug mentions,
chemical mentions, organ mentions
The distribution across these features gives significant clues
about the persona.
These features were extracted using MetaMap.
Tweet specific features
Consist features specific to tweets only
Examples - number of hashtags
ASONAM 2017 19 / 30
Approach
CNN Architecture
For experiments related to tweets, we use the following CNN
architecture
Softmax / Sigmoid
Convolution
Layer
Max-pooling
Layer
Pre-trained Word
Embedding Layer
I am suffering pneumonia
Figure: CNN
ASONAM 2017 20 / 30
Approach
CNN-LSTM Architecture
For experiments related to blogs, we use the following CNN-LSTM
architecture
LSTM LSTM LSTM
Softmax / Sigmoid
Layer
Convolution
Layer
Max-pooling
Layer
Pre-trained Word
Embedding Layer
Sequential
Layer
I treated a patient He was suffering fever Hygiene highly impacts dengue
Figure: CNN-LSTMASONAM 2017 21 / 30
Evaluation Metrics
Each evaluation metric is described on a per instance basis which
is subsequently averaged over all instances to obtain the aggregate
value.
Let l and pr be the true label set and predicted label set for
document d
Exact Match =
1 if l = pr
0 otherwise
(2)
Jaccard Similarity = |l ∩ pr|/|l ∪ pr| (3)
Precision = |l ∩ pr|/|pr| (4)
Recall = |l ∩ pr|/|l| (5)
F − Score = 2 ∗ Precision ∗ Recall/(Precision + Recall) (6)
ASONAM 2017 22 / 30
Evaluation Metrics
Hamming Loss =
|L|
j=1 xor(lj , prj )
|L|
(7)
Hamming Score = 1 − Hamming Loss (8)
where lj , prj denote jth element of l and pr respectively.
ASONAM 2017 23 / 30
Experimental Details
Throughout this work, we conduct 10 fold cross validation
experiments.
For extracting semantic features we use MetaMap.
For tuning hyperparameters in CNN and CNN-LSTM models,
we used a grid search over the entire hyper-parameter space
which includes
Number of convolution filters
Filter sizes
Activation Functions (ReLU and sigmoid)
Size of hidden layer
Number of epochs
We select the configuration which maximizes the F-Score on a
hold-out validation set.
ASONAM 2017 24 / 30
Results
Blogs
Approach LT
Method
Emb
Id
JS EM HS F-
Score
Word
unigrams
BR
-
0.446 0.393 0.870 0.520
LP 0.566 0.511 0.865 0.570
Character
n-grams
BR
-
0.460 0.401 0.871 0.530
LP 0.577 0.523 0.868 0.580
Feature
Engineering
BR
-
0.461 0.409 0.872 0.530
LP 0.574 0.518 0.867 0.580
Averaged
Word2Vec
BR 3 0.608 0.521 0.880 0.600
LP 4 0.627 0.568 0.886 0.640
CNN-
LSTM
BR 3 0.496 0.421 0.846 0.460
LP 3 0.586 0.514 0.869 0.600
Table: Results of all Approaches for Blogs
ASONAM 2017 25 / 30
Results
Tweets
Approach LT
Method
Emb
Id
JS EM HS F-
Score
Word
unigrams
BR
-
0.427 0.352 0.862 0.500
LP 0.518 0.441 0.846 0.510
Character
n-grams
BR
-
0.421 0.353 0.864 0.480
LP 0.513 0.435 0.845 0.490
Feature
Engineering
BR
-
0.450 0.366 0.865 0.520
LP 0.540 0.455 0.852 0.540
Averaged
Word2Vec
BR 3 0.563 0.469 0.863 0.560
LP 4 0.544 0.462 0.853 0.520
CNN
BR 4 0.593 0.499 0.873 0.590
LP 4 0.582 0.489 0.864 0.580
Table: Results of all Approaches for Tweets
ASONAM 2017 26 / 30
Analysis
Feature Analysis
Feature
Group
Best Feature (Blogs) Best Feature (Tweets)
Document # characters (3) # characters (8)
Syntactic # Money mentions (2) # Money mentions (6)
List lookup # matching words with
consultant list (1)
# matching words with
patient word list (29)
Semantic # Inorganic chemical (38) # research activity (34)
POS # Foreign word (163) # Personal Pronoun
(116)
Tweet
specific
- # hashtags (9)
Table: Feature Analysis for Blogs and Tweets based on χ2
metric.
Number in the parenthesis indicates feature rank (lesser the better)
ASONAM 2017 27 / 30
Analysis and Conclusion
Averaged word2vec (for blogs), CNN model (for tweets)
outperforms other approaches.
CNN-LSTM model fails to outperform averaged word2vec
method, mainly due to the high number of trainable model
parameters
Word embeddings with superior medical concept coverage do
not perform well against others. [May be coverage is not very
crucial for this task.]
Word embeddings trained purely on medical text (like
PubMed articles) do not outperform others.
Lack of diversity of persona in training data
Most of the data is generated by few personae (like researchers
for PubMed)
ASONAM 2017 28 / 30
Future Work
The current features are limited to a posts content, we would
like to explore other features
like social features, for example, number of followers on Twitter
We wish to experiment with distant supervision based
methods to get automatically labeled examples for data
hungry models like CNN-LSTM.
ASONAM 2017 29 / 30
Thank You !!
For any queries, please contact nikhil.pattisapu@research.iiit.ac.in

Más contenido relacionado

Was ist angesagt?

The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2Saman Sara
 
Lecture 4: NBERMetrics
Lecture 4: NBERMetricsLecture 4: NBERMetrics
Lecture 4: NBERMetricsNBER
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...ijmvsc
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model IJECEIAES
 
Opinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPOpinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...TELKOMNIKA JOURNAL
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationPalani Kumar
 
Fundamentals of Research Methodology
Fundamentals of Research MethodologyFundamentals of Research Methodology
Fundamentals of Research Methodologydrsaravanan1977
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statisticsLamineKaba6
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Measurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scalingMeasurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scalingRohit Kumar
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text miningredpel dot com
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Sherpa Software
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with refLola Devung
 

Was ist angesagt? (20)

0
00
0
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Lecture 4: NBERMetrics
Lecture 4: NBERMetricsLecture 4: NBERMetrics
Lecture 4: NBERMetrics
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Data
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model
 
Opinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLPOpinion mining on newspaper headlines using SVM and NLP
Opinion mining on newspaper headlines using SVM and NLP
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data c...
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
 
Fundamentals of Research Methodology
Fundamentals of Research MethodologyFundamentals of Research Methodology
Fundamentals of Research Methodology
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Aq35241246
Aq35241246Aq35241246
Aq35241246
 
Measurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scalingMeasurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scaling
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with ref
 

Ähnlich wie Medical Persona Classification in Social Media

Week 6 - Final Project Psychological Treatment Plan It is reco.docx
Week 6 - Final Project Psychological Treatment Plan It is reco.docxWeek 6 - Final Project Psychological Treatment Plan It is reco.docx
Week 6 - Final Project Psychological Treatment Plan It is reco.docxcockekeshia
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 
Performance analysis of the
Performance analysis of thePerformance analysis of the
Performance analysis of thecsandit
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionEditorIJAERD
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
Effective Navigation of Query Results Based On Hierarchies
Effective Navigation of Query Results Based On HierarchiesEffective Navigation of Query Results Based On Hierarchies
Effective Navigation of Query Results Based On HierarchiesAkhil Ambekar
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomiesdermotte
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHRAshis Chanda
 
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM
T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM csandit
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
 
Archetype Modeling Methodology
Archetype Modeling MethodologyArchetype Modeling Methodology
Archetype Modeling MethodologyDavid Moner Cano
 
Relationships among Supplier Selection Criteria using Interpretive Structural...
Relationships among Supplier Selection Criteria using Interpretive Structural...Relationships among Supplier Selection Criteria using Interpretive Structural...
Relationships among Supplier Selection Criteria using Interpretive Structural...inventionjournals
 

Ähnlich wie Medical Persona Classification in Social Media (20)

Week 6 - Final Project Psychological Treatment Plan It is reco.docx
Week 6 - Final Project Psychological Treatment Plan It is reco.docxWeek 6 - Final Project Psychological Treatment Plan It is reco.docx
Week 6 - Final Project Psychological Treatment Plan It is reco.docx
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 
D018212428
D018212428D018212428
D018212428
 
Performance analysis of the
Performance analysis of thePerformance analysis of the
Performance analysis of the
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
The sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regressionThe sarcasm detection with the method of logistic regression
The sarcasm detection with the method of logistic regression
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Effective Navigation of Query Results Based On Hierarchies
Effective Navigation of Query Results Based On HierarchiesEffective Navigation of Query Results Based On Hierarchies
Effective Navigation of Query Results Based On Hierarchies
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
Malhotra02.....
Malhotra02.....Malhotra02.....
Malhotra02.....
 
Malhotra02.....
Malhotra02.....Malhotra02.....
Malhotra02.....
 
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM
T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
 
Archetype Modeling Methodology
Archetype Modeling MethodologyArchetype Modeling Methodology
Archetype Modeling Methodology
 
Relationships among Supplier Selection Criteria using Interpretive Structural...
Relationships among Supplier Selection Criteria using Interpretive Structural...Relationships among Supplier Selection Criteria using Interpretive Structural...
Relationships among Supplier Selection Criteria using Interpretive Structural...
 

Mehr von IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

Mehr von IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Último

presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxkhfaizan534
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxSAQIB KHURSHEED WANI
 
pulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxpulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxNishanth Asmi
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationJeyporess2021
 
Final PPT.ppt about human detection and counting
Final PPT.ppt  about human detection and countingFinal PPT.ppt  about human detection and counting
Final PPT.ppt about human detection and countingArbazAhmad25
 
First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideMonika860882
 
Introduction to Data Structures .
Introduction to Data Structures        .Introduction to Data Structures        .
Introduction to Data Structures .Ashutosh Satapathy
 
Governors ppt.pdf .
Governors ppt.pdf                              .Governors ppt.pdf                              .
Governors ppt.pdf .happycocoman
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityMorshed Ahmed Rahath
 
عناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringعناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringmennamohamed200y
 
electricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxelectricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxAravindhKarthik1
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesMark Billinghurst
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basicsKNaveenKumarECE
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structureswendy cai
 
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdftpo482247
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptxHusseinMishbak
 
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...marijomiljkovic1
 
Injection Power Cycle - The most efficient power cycle
Injection Power Cycle - The most efficient power cycleInjection Power Cycle - The most efficient power cycle
Injection Power Cycle - The most efficient power cyclemarijomiljkovic1
 
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024California Asphalt Pavement Association
 

Último (20)

presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptx
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
 
pulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxpulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptx
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station Presentation
 
Final PPT.ppt about human detection and counting
Final PPT.ppt  about human detection and countingFinal PPT.ppt  about human detection and counting
Final PPT.ppt about human detection and counting
 
First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slide
 
Introduction to Data Structures .
Introduction to Data Structures        .Introduction to Data Structures        .
Introduction to Data Structures .
 
Governors ppt.pdf .
Governors ppt.pdf                              .Governors ppt.pdf                              .
Governors ppt.pdf .
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
 
عناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineeringعناصر نباتية PDF.pdf architecture engineering
عناصر نباتية PDF.pdf architecture engineering
 
electricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptxelectricity generation from food waste - based bioenergy with IOT.pptx
electricity generation from food waste - based bioenergy with IOT.pptx
 
Evaluation Methods for Social XR Experiences
Evaluation Methods for Social XR ExperiencesEvaluation Methods for Social XR Experiences
Evaluation Methods for Social XR Experiences
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basics
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structures
 
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptx
 
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
 
Injection Power Cycle - The most efficient power cycle
Injection Power Cycle - The most efficient power cycleInjection Power Cycle - The most efficient power cycle
Injection Power Cycle - The most efficient power cycle
 
FOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG studentsFOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG students
 
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
 

Medical Persona Classification in Social Media

  • 1. Medical Persona Classification in Social Media Nikhil Pattisapu1 , Manish Gupta1,2 , Ponnurangam Kumaraguru3 , Vasudeva Varma1 1IIIT Hyderabad 2Microsoft India 3IIIT Delhi Advances in Social Network Analysis and Mining 2017 ASONAM 2017 1 / 30
  • 2. Overview Motivation Problem Definition Related Work Dataset Approach Evaluation Metrics Experiments Results Analysis and Conclusion Future Work ASONAM 2017 2 / 30
  • 3. Motivation What is Medical Persona? User groups and content providers of Web 2.0 applications in healthcare. Some examples - Patient Caretaker Consultant Journalist Pharmacist Researcher Other ASONAM 2017 3 / 30
  • 4. Motivation Pharmaceutical firms use Medical social media for Drug marketing and pharmacovigilance. Figure: Sample post from drugs.com describing a patient’s experiences with the drug Keppra. ASONAM 2017 4 / 30
  • 5. Motivation Use cases Few use cases for identifying medical persona are mentioned below. To gather information about drug usage, adverse events, benefits and side effects from patients. To find out the kind of informational assistance sought by caretakers and put such information readily available. To identify key opinion leaders in a drug or disease area. To find out if a doctor has patients who can take part in a clinical trial. ASONAM 2017 5 / 30
  • 6. Motivation Use cases To gather information on conversations between pharmacists and others to identify drug dosage, interactions and therapeutic effects. To acquire or collaborate on technologies invented by researchers that can be a part of the drug pipeline. To gather information about journalists’ survey on quality of life of patients. ASONAM 2017 6 / 30
  • 7. Problem Definition Given a social media post, identify the medical personae associated with it. We pose this as multi-label text classification problem, where our label set is {Patient, Caretaker, Consultant, Journalist, Pharmacist, Researcher, Other} There are two primary reasons for setting this as a multi-label classification task (as opposed to single-label) There might be posts involving conversations between multiple personae. For example, a blog describing patient-consultant conversation. A post might be of ambiguous nature and hence can potentially be mapped to more than one label by a human annotator. ASONAM 2017 7 / 30
  • 8. Related Work This problem is primarily related to two problems, which are thoroughly studied in literature Authorship Attribution - The task of determining the author of a particular document Automatic Genre Identification (AGI) - The task of classifying documents based on genres (which includes their form, structure, functional trait, communicative purpose, targeted audience and narrative style) rather than the content, topics or subjects that the documents span. ASONAM 2017 8 / 30
  • 9. Related Work State-of-the-art Methods For both, authorship attribution and AGI, supervised algorithms based on extensive feature engineering have been proposed. The top features include Word n-grams Character n-grams Common words Function words Part-of-speech tags Document statistics (e.g. document length) HTML tags. Stylistic features Acronyms Hashtag and reply mentions. ASONAM 2017 9 / 30
  • 10. Related Work Why can’t existing methods be trivially adapted? Different features need to be explored for medical domain. As opposed to most methods proposed in literature, our task is of closed-set multi-label type. Each persona has several users and will itself contain heterogeneity. ASONAM 2017 10 / 30
  • 11. Dataset Blog / Tweet Search API Noise Filtering & Deduplication Human Annotation Query Blogs / Tweets Labeled Blogs / Tweets Figure: Dataset Collection Our dataset consists of both blogs as well as tweets. Examples of queries include drug names - minocycline, qvar, gilenya Whenever using only drugs as queries resulted in a lot of irrelevant content, drug-disease pairs (e.g. acne minocycline) were used as queries. We used 50 queries and retrieved 50 blogs and 30 tweets per query. Noisy posts, retweets were removed. ASONAM 2017 11 / 30
  • 12. Dataset Figure: Dataset Statistics 1581 blogs and 1025 tweets were annotated The inter-annotator agreement between 4 annotators was found to be 0.708 for blogs and 0.70 for tweets. The label cardinality of blogs and tweets was 1.18 and 1.24 respectively. The maximum label cardinality of a blog was 2 and that of a tweet was 3. ASONAM 2017 12 / 30
  • 13. Approach Overview We first transform the multi-label task into one or more single label-task using Binary label transformation Label powerset transformation We then use the following approaches to solve this task N-gram approach Feature Engineering Averaged Word Vectors CNN-LSTM ASONAM 2017 13 / 30
  • 14. Approach Label transformation method Binary Relevance Method We train an individual classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. Label Powerset Method We train one binary classifier for every label combination attested in the training set For an unseen example, prediction is done using a voting scheme. ASONAM 2017 14 / 30
  • 15. Approach N-gram approach (Baseline) Each document is represented as a TF-IDF vector over the entire vocabulary. An SVM is trained to classify the document into one or more of the pre-defined personae. Both Word n-grams and character n-grams are used. Averaged word Vectors document vector(di ) = wi j word embedding(wi j ) len(pi ) (1) ASONAM 2017 15 / 30
  • 16. Approach Word Embedding Details ID Training Source Training Algo- rithm #Dim #Entries Domain 1 Medical Tweets (ADR) Word2Vec 200 1344629 Medical 2 Twitter GloVe 200 1193515 Generic 3 Web crawl 1 GloVe 300 2196018 Generic 4 Web crawl 2 GloVe 300 1917495 Generic 5 PubMed, PMC, Wikipedia Word2Vec 200 5443656 Medical Table: Pre-trained Word Embedding Details ASONAM 2017 16 / 30
  • 17. Approach Feature Engineering For this task, we manually engineered a total of 89 features, distributed in 6 feature types. Document Level features (4) Captures generic features of a post Examples - Number of sentences, average sentence length, average word length Pharmacist blogs are lengthier than Patient blogs. POS features (33) Capture the distribution of different Parts-of-Speech in the document. Example - Number of Adjectives A Consultant is 1.6 times more likely to use adjectives than a journalist. ASONAM 2017 17 / 30
  • 18. Approach Feature Engineering List lookup features (7) Include the average frequency of terms which occur in the document as well as in a particular list. Example - List of abusive words. The terms MD, Dr., MBBS, FRCS, consultation fee, were found to be more frequent in consultant blogs than others. Syntactic features (7) Capture the presence or absence of various classes of terms. Example - date, person, location, organization, time, money, and percentage amounts. Researcher blogs contain more percentage mentions than others. ASONAM 2017 18 / 30
  • 19. Approach Feature Engineering Semantic features (35) Consist of a lot of medical domain specific features Examples - number of disease mentions, drug mentions, chemical mentions, organ mentions The distribution across these features gives significant clues about the persona. These features were extracted using MetaMap. Tweet specific features Consist features specific to tweets only Examples - number of hashtags ASONAM 2017 19 / 30
  • 20. Approach CNN Architecture For experiments related to tweets, we use the following CNN architecture Softmax / Sigmoid Convolution Layer Max-pooling Layer Pre-trained Word Embedding Layer I am suffering pneumonia Figure: CNN ASONAM 2017 20 / 30
  • 21. Approach CNN-LSTM Architecture For experiments related to blogs, we use the following CNN-LSTM architecture LSTM LSTM LSTM Softmax / Sigmoid Layer Convolution Layer Max-pooling Layer Pre-trained Word Embedding Layer Sequential Layer I treated a patient He was suffering fever Hygiene highly impacts dengue Figure: CNN-LSTMASONAM 2017 21 / 30
  • 22. Evaluation Metrics Each evaluation metric is described on a per instance basis which is subsequently averaged over all instances to obtain the aggregate value. Let l and pr be the true label set and predicted label set for document d Exact Match = 1 if l = pr 0 otherwise (2) Jaccard Similarity = |l ∩ pr|/|l ∪ pr| (3) Precision = |l ∩ pr|/|pr| (4) Recall = |l ∩ pr|/|l| (5) F − Score = 2 ∗ Precision ∗ Recall/(Precision + Recall) (6) ASONAM 2017 22 / 30
  • 23. Evaluation Metrics Hamming Loss = |L| j=1 xor(lj , prj ) |L| (7) Hamming Score = 1 − Hamming Loss (8) where lj , prj denote jth element of l and pr respectively. ASONAM 2017 23 / 30
  • 24. Experimental Details Throughout this work, we conduct 10 fold cross validation experiments. For extracting semantic features we use MetaMap. For tuning hyperparameters in CNN and CNN-LSTM models, we used a grid search over the entire hyper-parameter space which includes Number of convolution filters Filter sizes Activation Functions (ReLU and sigmoid) Size of hidden layer Number of epochs We select the configuration which maximizes the F-Score on a hold-out validation set. ASONAM 2017 24 / 30
  • 25. Results Blogs Approach LT Method Emb Id JS EM HS F- Score Word unigrams BR - 0.446 0.393 0.870 0.520 LP 0.566 0.511 0.865 0.570 Character n-grams BR - 0.460 0.401 0.871 0.530 LP 0.577 0.523 0.868 0.580 Feature Engineering BR - 0.461 0.409 0.872 0.530 LP 0.574 0.518 0.867 0.580 Averaged Word2Vec BR 3 0.608 0.521 0.880 0.600 LP 4 0.627 0.568 0.886 0.640 CNN- LSTM BR 3 0.496 0.421 0.846 0.460 LP 3 0.586 0.514 0.869 0.600 Table: Results of all Approaches for Blogs ASONAM 2017 25 / 30
  • 26. Results Tweets Approach LT Method Emb Id JS EM HS F- Score Word unigrams BR - 0.427 0.352 0.862 0.500 LP 0.518 0.441 0.846 0.510 Character n-grams BR - 0.421 0.353 0.864 0.480 LP 0.513 0.435 0.845 0.490 Feature Engineering BR - 0.450 0.366 0.865 0.520 LP 0.540 0.455 0.852 0.540 Averaged Word2Vec BR 3 0.563 0.469 0.863 0.560 LP 4 0.544 0.462 0.853 0.520 CNN BR 4 0.593 0.499 0.873 0.590 LP 4 0.582 0.489 0.864 0.580 Table: Results of all Approaches for Tweets ASONAM 2017 26 / 30
  • 27. Analysis Feature Analysis Feature Group Best Feature (Blogs) Best Feature (Tweets) Document # characters (3) # characters (8) Syntactic # Money mentions (2) # Money mentions (6) List lookup # matching words with consultant list (1) # matching words with patient word list (29) Semantic # Inorganic chemical (38) # research activity (34) POS # Foreign word (163) # Personal Pronoun (116) Tweet specific - # hashtags (9) Table: Feature Analysis for Blogs and Tweets based on χ2 metric. Number in the parenthesis indicates feature rank (lesser the better) ASONAM 2017 27 / 30
  • 28. Analysis and Conclusion Averaged word2vec (for blogs), CNN model (for tweets) outperforms other approaches. CNN-LSTM model fails to outperform averaged word2vec method, mainly due to the high number of trainable model parameters Word embeddings with superior medical concept coverage do not perform well against others. [May be coverage is not very crucial for this task.] Word embeddings trained purely on medical text (like PubMed articles) do not outperform others. Lack of diversity of persona in training data Most of the data is generated by few personae (like researchers for PubMed) ASONAM 2017 28 / 30
  • 29. Future Work The current features are limited to a posts content, we would like to explore other features like social features, for example, number of followers on Twitter We wish to experiment with distant supervision based methods to get automatically labeled examples for data hungry models like CNN-LSTM. ASONAM 2017 29 / 30
  • 30. Thank You !! For any queries, please contact nikhil.pattisapu@research.iiit.ac.in