SlideShare ist ein Scribd-Unternehmen logo
1 von 93
Downloaden Sie, um offline zu lesen
Committee:
Amit Sheth (Advisor)
T. K. Prasad
Michael Raymer
Jyotishman Pathak
(Cornell University)
Ph.D. Dissertation Defense
Knowledge Driven Search Intent Mining
Ashutosh Jadhav
April 18, 2016
2	
  
3.5 billion Web searches per day
3	
  
3.5 billion Web searches per day
One of the key aspects in building an intelligent search
engine is to understand users’ search intents
4	
  
Search Intent Mining !
Applications
Search Result Ranking
5	
  
Search Result Diversification
Search Personalization
Search Ads
Web Search Intent
6	
  
Search intent is a significant object/topic that
represents abstraction of users’ information needs.
Search Goals*
Search Topics
Why
WhatWhat
7	
  
Search Intent
Mining
Search
Goals
Search Topics
Session dataClick-through Query log
Manual
Unsupervised
Supervised
Ontology-based
Knowledge
driven
My Work
Related Work
Broder 2002
Beeferma 2003
Rose and
Levinson 2004
Baeza-Yates 2006
Hu et al. 2009
Sadikov 2010
Nanda 2014
Ustinovskiy 2013
White 2010
Joachims 2002
Lee 2005
Fujita 2010
Hu 2012
Broder 2007
Radlinski 2010
Celikyilmaz 2011
Shen 2006
Biomedical KB – UMLS
Crowd-sourced KB – Wikipedia
Dictionaries – Hunspell,
OpenMedspell
Techniques
8	
  
Search is shifting toward understanding intent
and serving objects
-Li et al., ACL, 2010
9	
  
10	
  
Health MovieSports TechnologyPhysics
Health
Diseases
Symptoms
Causes
Medications
Treatments
Prevention
 
	
  
Web Search for Health Information
Among all topics available on the
Internet, health is one of the most
important in terms of impact on the user
11	
  
•  Major Challenges
Ø  Consumers’ lack of
medical knowledge to
formulate health search
queries
Ø  Search engines’ failure to
understand users’ health
search intents
12	
  
Challenges in Health Information Search
•  Health information search is a “trial-and error” process.
	
  
•  Health search intent mining applications:
–  Personalized health information interventions
–  To get better understanding of consumers’ health
information needs
–  Targeted advertisements
Motivation: Real-world Applications
13	
  
Research Problem: Domain specific search intent mining
14	
  
Thesis Statement
Rich background knowledge from biomedical knowledge
bases and Wikipedia enables development of effective
methods for:
I.  Intent mining from health-related search queries in
disease agnostic manner
II.  Efficient browsing of informative health information
shared on social media.
	
  
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
Health Search Intent
15	
  
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
Health Search Intent
16	
  
Three focus groups
Study questions:
•  Motivation for using internet for health information seeking
•  What do they search? (search intent)
•  How do they search?
•  What are the challenges in the search
	
  
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
–  Health categories on popular health websites
–  Review of online health information seeking literature
–  Empirical data analysis
The intent classes and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts
Health Search Intent
17	
  
Selection criteria:
•  Google PageRank, Alexa ranking,
•  Medical Library Association’s ranking (CAPHIS - Consumer
and Patient Health Information Section)
Selected websites:
Mayo Clinic, WebMD, MedlinePlus, CDC, HealthFinder.gov,
and Familydoctor.org.
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
–  Health categories on popular health websites
–  Review of online health information seeking literature
–  Empirical data analysis
The intent classes and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts
Health Search Intent
18	
  
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
–  Health categories on popular health websites
–  Review of online health information seeking literature
–  Empirical data analysis
The intent classes and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts
Health Search Intent
19	
  
Intent Classes Intent Classes
1 Symptoms 8 Living with
2 Causes 9 Prevention
3 Risks & Complications 10 Side effects
4 Drugs and Medications 11 Medical devices
5 Treatments 12 Diseases and conditions
6 Tests and Diagnosis 13 Age-group References
7 Food and Diet 14 Vital signs
•  Focus: Consumer-oriented health search intent
•  Challenge: No standardized list of consumer-oriented health
intent classes
•  Approach:
–  Qualitative study (published in JMIR, impact factor 4.7)
–  Health categories on popular health websites
–  Review of online health information seeking literature
–  Empirical data analysis
The intent classes and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts
Health Search Intent
20	
  
Intent Classes Intent Classes
1 Symptoms 8 Living with
2 Causes 9 Prevention
3 Risks & Complications 10 Side effects
4 Drugs and Medications 11 Medical devices
5 Treatments 12 Diseases and conditions
6 Tests and Diagnosis 13 Age-group References
7 Food and Diet 14 Vital signs
•  Allows the instances to be associated with more than one
class
•  Problem transformation methods (fit data to algorithm)
–  Transform the multi-label classification problem either into one or
more single-label classification problems.
–  e.g., Binary Relevance, Label Power, and RAKEL-RAndom k-LabELsets
•  Algorithm adaptation methods (fit algorithm to data)
–  Extend specific learning algorithms in order to handle multi-label
data directly.
–  e.g., Tree-based boosting - AdaBoost.MR, ML-kNN, and Rank-SVM
21	
  
Multi-label Classification
Both these methods follow underlying principles of the
supervised learning approach and depend on training data. 	
  
•  Manual, time consuming and labor intensive process
•  May require domain experts
•  Limited coverage
–  Training data should be a representative sample of the dataset
–  Very difficult to create a training dataset that can cover all
aspects (discriminative features) of the dataset
•  Generalization problem
–  Poor performance on unseen data
Challenges with Training Data Creation
22	
  
These challenges get amplified for multi-label
classification problems
	
  
In the context of health search intent mining problem
•  Training data for 14 intent classes
•  Need domain experts to label dataset
Supervised Classification Limitations
23	
  
Domain constraint: A classifier trained for one
disease may not work for other diseases
These challenges make supervised learning-
based approaches infeasible for our problem	
  
24	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
25	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Ontology
Timeframe: early 2000
First patent on Semantic Web
More information at blog
26	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Ontology
27	
  
Knowledge Driven Approach
Machine Processable Knowledge
Ontologies
Taxonomies
Dictionaries
Knowledge-
bases
Unified Medical Language System	
  
•  UMLS (Unified Medical Language System)
–  Collection of over 100 controlled vocabularies such as
MeSH, SNOMED_CT, NCI, and RxNorm
Biomedical Knowledge Base
28	
  
Metathesaurus
Collection of
concepts
Semantic Network
Semantic Types and
Semantic Relationships
SPECIALIST Lexicon
Biomedical terms and
their variants
•  Concept identification consists of two primary tasks:
–  Concept recognition and concept mapping
–  Example : what are the medications for stomach pain?
Concepts: medication, stomach pain
Challenges
•  Lexical or orthographic variants e.g., (diet, dieting), (ICD9, ICD-9)
•  Misspelling, e.g., (pneumonia, neumonia)
•  Synonyms, e.g., (heart attack, myocardial infarction)
•  Abbreviations, e.g., (myocardial infarction, MI)
•  Identifying concept boundary e.g., (pain in stomach, stomach pain)
•  Contextual meanings, e.g., (discharge from hospital, discharge from
wound)
Concept Identification
29	
  
•  Medical concept identification tools
–  UMLS MetaMap, cTAKES, MedLEE, NCBO Annotator
•  UMLS MetaMap
–  Identifies ULMS Metathesaurus concepts from text
–  Semantic Type (e.g., disease or syndrome)
–  UMLS Concept (e.g., blood pressure and heart rate)
•  Example (UMLS Concept) [Sematic Type]
–  Phrase query: red wine heart attack
•  Red wine (Red wine) [Food]
•  Heart Attack (Myocardial Infarction) [Disease or Syndrome]
30	
  
Concept Identification
•  Phrase query: water on the brain
–  Water (Drinking Water) [Substance]
–  Brain (Brain) [Body Part, Organ, or Organ Component]
•  Actual Mapping should be
–  Water on the brain (Hydrocephalus) [Disease or
Syndrome]
Concept Identification Challenges
31	
  
Concept Identification Approach
32	
  
•  Advanced text analytics
–  Word Sense Disambiguation (WSD)
•  Process of identifying the meaning of a term in context
•  With the WSD advancement, concepts are identified by
considering the surrounding text
–  Maximal phase detection
•  Process each input record as a single phrase in order to
identify more complex Metathesaurus terms
•  Consumer Health Vocabulary (CHV)
•  Consumer Health Vocabulary (CHV)
–  Maps terms used by layman to medical terms
–  E.g. hair loss => Alopecia
•  Problem: CHV in UMLS is incomplete
•  Example: water on the knee
Water thick-knee (Burhinus vermiculatus) [Bird]
•  Actual Mapping should be
–  Water on the knee(Knee effusion ) [Disease or
Syndrome]
Consumer Health Vocabulary
33	
  
•  Consumer Health Vocabulary (CHV)
–  Maps terms used by layman to medical terms
–  E.g. hair loss => Alopecia
•  Problem: CHV in UMLS is incomplete
•  Example: water on the knee
Water thick-knee (Burhinus vermiculatus) [Bird]
•  Actual Mapping should be
–  Water on the knee(Knee effusion ) [Disease or
Syndrome]
Consumer Health Vocabulary
34	
  
Major challenge for health search intent mining problem
•  Traditional approach
–  Identification of consumer-oriented terms from Medline search
log, PatientsLikeMe forum data
–  Manual review by healthcare professionals
Approach: leverage knowledge from Wikipedia
•  One of the most-used online health resources
•  Continuously updated with emerging health terms
•  Links consumer-oriented terms with health
professionals terms using semantic relationships
Consumer Health Vocabulary Generation
35	
  
•  Traditional approach
–  Identification of consumer-oriented terms from Medline search
log, PatientsLikeMe forum data
–  Manual review by healthcare professionals
Approach: leverage knowledge from Wikipedia
•  One of the most-used online health resources
•  Continuously updated with emerging health terms
•  Links consumer-oriented terms with health
professionals terms using semantic relationships
Consumer Health Vocabulary Generation
36	
  
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
37	
  
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
38	
  
•  Wikipedia: Crowd sourced encyclopedia
Consumer Health Vocabulary Generation
39	
  
Health-related
Wikipedia
articles
Health
Category
Candidate
subcategories
Articles tagged
with candidate
subcategories
Step 1: Identification of health-related Wikipedia articles
Snippet 2: Knee effusion or swelling of the knee (colloquially
known as water on the knee) occurs when excess synovial
fluid accumulates in or around the knee joint.
Snippet 1: Hair loss, also known as alopecia or baldness,
refers to a loss of hair from the head or body.
40	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
41	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Snippet 2: Knee effusion or swelling of the knee (colloquially
known as water on the knee) occurs when excess synovial
fluid accumulates in or around the knee joint.
Snippet 1: Hair loss, also known as alopecia or baldness,
refers to a loss of hair from the head or body.
42	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Pairs Terms
Semantic
Relationship
Terms
1 hair loss also known as alopecia
2 hair loss also known as baldness
3 knee effusion
colloquially known
as
water on the
knee
4
swelling of the
knee
colloquially known
as
water on the
knee
5 knee effusion same as
swelling of the
knee
43	
  
Consumer Health Vocabulary Generation
Step 2: Extraction of candidate pairs
Wikipedia Patterns
also called commonly called colloquially known as
also known as commonly known as sometimes called
also referred to as commonly termed sometimes known as
also termed previously known as sometimes termed
commonly referred to
as
colloquially referred
to as
sometimes referred
to as
Pattern-based information extractor
44	
  
Consumer Health Vocabulary Generation
Step 3: Identification of CHV and medical terms from the
candidate pairs
Map terms from the candidate pairs to UMLS Metathesaurus
using MetaMap
•  Scenario 1:
-  Both terms are present in the UMLS Metathesaurus
-  e.g., {hair loss, alopecia}
•  Scenario 2:
-  Both terms are not present in the UMLS Metathesaurus
-  e.g., {hospital trust, acute trust}
•  Scenario 3:
-  Only one term is present in the UMLS Metathesaurus
-  e.g., {knee effusion, water on the knee}
•  Data:
–  Cardiovascular disease (CVD) related search queries
–  Limited to the United States
•  Data timeframe:
–  September 2011 to August 2013
•  Data collection tool:
–  IBM NetInsight On Demand
(Web Analytics tool)
•  Dataset size:
–  10.4 million CVD related search queries
–  Significantly large dataset for a
single class of diseases. 45	
  
Dataset
•  Preprocessing
–  Stop word removal
–  Misspelling correction (using Hunspell spell checker)
•  Dictionaries: Hunspell dictionary, and its medical version,
OpenMedSpell
–  Replace all CHV terms from the search queries with medical
terms
•  UMLS MetaMap
–  Usage challenge: Significantly slow for millions of search queries
Data Processing
46	
  
•  Preprocessing
–  Stop word removal
–  Misspelling correction (using Hunspell spell checker)
•  Dictionaries: Hunspell dictionary, and its medical version,
OpenMedSpell
–  Replace all CHV terms from the search queries with medical
terms
•  UMLS MetaMap
–  Usage challenge: Significantly slow for millions of search queries
Data Processing
47	
  
Solution: Developed a scalable MetaMap implementation
using a Hadoop-MapReduce framework
•  Gold standard dataset
–  Two domain experts annotated randomly selected search queries
by labeling one search query with zero or more intent classes
–  Gold standard dataset is further divided into training and testing
•  Evaluation Matrics
–  Macro Average Precision Recall
–  Average of the precision and recall of the classification algorithm
on different classes
–  To identify classification performance at class-level
48	
  
Evaluation
•  Search Query Annotation
–  UMLS concepts and semantic types
•  Classification Rules
49	
  
Classification: Annotation and Rules
Intent Class Classification Rule Examples
Drugs and
Medications
•  {ST ∪ SC ∪ KW}  SC*
•  ST: ORCH|PHSU, CLND,
PHSU
•  SC: medication, medicine,
drugs, dose, dosage, tablet,
pill
•  KW: meds
•  (Without) SC*: alcohol,
caffeine, fruit, prevent
•  medications for
pulmonary
hypertension
•  ibuprofen heart rate
•  dextromethorphan
blood pressure
Abbreviations:
ORCH - Organic Chemical
PHSU - Pharmacologic
Substance
CLND - Clinical Drug
50	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
For Drug and medication Intent Class
Correctly classified Wrongly classified
•  ibuprofen heart rate
•  dextromethorphan blood
pressure
•  medications for pulmonary
hypertension
•  alcohol heart disease
•  meds for acid reflux
51	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
For Drug and medication Intent Class
Correctly classified Wrongly classified
•  ibuprofen heart rate
•  dextromethorphan blood
pressure
•  medications for pulmonary
hypertension
•  alcohol heart disease
•  meds for acid reflux
52	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
For Drug and medication Intent Class
Correctly classified Wrongly classified
•  ibuprofen heart rate
•  dextromethorphan blood pressure
•  medications for pulmonary hypertension
•  meds for acid reflux
•  alcohol heart
disease
53	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
For Drug and medication Intent Class
Correctly classified
•  ibuprofen heart rate
•  meds for acid reflux
•  alcohol heart disease
•  medications for pulmonary
hypertension
•  dextromethorphan blood pressure
54	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST (baseline approach) 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
•  Phrase query: water on the brain
–  Water (Drinking Water) [Substance]
–  Brain (Brain) [Body Part, Organ, or Organ Component]
•  Actual Mapping should be
–  Water on the brain (Hydrocephalus) [Disease or Syndrome]
•  Advanced Text Analytics
–  Word sense disambiguation, maximal phrase detection, CHV from
UMLS
55	
  
Classification : Evaluation Results
Rules Precision Recall F1 Score
ST 0.5432 0.6203 0.5791
ST+SC 0.6534 0.6822 0.6674
ST+SC+KW 0.6722 0.6923 0.6821
ST+SC+KW-ST* 0.7383 0.7344 0.7363
ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762
ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459
ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723
ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword
AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary
•  Generating CHV from Wikipedia
•  Example: water on the knee
Water thick-knee (Burhinus vermiculatus) [Bird]
•  Actual Mapping should be
–  Water on the knee(Knee effusion ) [Disease or Syndrome]
•  Macro Average
–  Precision:0.8842, Recall: 0.8607 and F-Score: 0.8723
56	
  
Classification : Evaluation Results
To check the performance of the classification approach for
individual intent classes
No Intent Classes Total Queries
Percentage
Distribution
1 Diseases 4,232,398 40.66
2 Vital signs 3,455,809 33.20
3 Symptoms 1,422,826 13.67
4 Living with 1,178,756 11.32
5 Treatments 955,701 9.18
6 Food and Diet 779,949 7.49
7 Med Devices 665,484 6.39
8 Drugs and Medications 603,905 5.80
9 Causes 599,895 5.76
10 Tests & Diagnosis 344,747 3.31
11 Risks and Complication 277,294 2.66
12 Prevention 136,428 1.31
13 Age-group References 87,929 0.84
14 Side effects 25,655 0.25
Total 14,766,776 141.87
57	
  
Classification: Results
8%	
  
48%	
  
40%	
  
4%	
  
0%	
  
Distribution of search queries by number of intent
classes in which they are classified
0
1
2
3
4 and more
58	
  
Classification: Results
Dataset Precision Recall F1-Score
Cardiovascular
Diseases
0.8842 0.8642 0.8723
Diabetes 0.9274 0.8964 0.9116
Cancer 0.8294 0.7635 0.7950
59	
  
Classification: Results
Personalized eHealth Interventions
60
Application
61
•  Hello,
For the past 10 hours I've been expierencing a semi sharp pain in
my upper right chest just below my armpit. This pain appears
anywhere from every two and a half minutes to ten or fifteen
minutes. I also have some stomach ache and dry mouth. I monitor
my blood pressure is averages 130/90 with a average heart rate of
80. My cardiologist has been treating me since 1 year for high
colesterol, gout and hypertension with great success. Also I have
diabetes and I am taking Metformin and mevacor. I have an
appointment with my cardiologist after 2 weeks. However I am
wondering should I go to ER? BTW I am 69 years old male.
Scenario in Clinical Decision Support System
Source: DailyStrength forum
62
Demographic
Information
dry mouth => Xerostomia
Drugs and Medication
Misspellings
Diseases and
Conditions
Symptom
Consumer Health
Vocabulary
expierencing => experiencing
colesterol => cholesterol
chest pain
stomach ache
Xerostomia (dry mouth)
Age: 69
Gender: Male
Metformin
Mevacor
Gout
Hypertension
Diabetes
Blood pressure: 130/90
Heart rate: 80Vital Signs
•  Primary Symptom
–  Chest pain
•  upper side
•  Right side
•  Other symptoms
–  Stomach ache
–  Dry mouth
•  Current diseases
–  Hypertension
–  Gout
–  Diabetes
•  Vital Signs
–  Blood pressure = normal
–  Heart rate = normal
63
1.  Diges2on-­‐Related	
  Causes	
  
2.  Cardiovascular	
  Problems	
  
3.  Viral	
  Infec2ons	
  
4.  Gallbladder	
  Infec2on	
  
5.  Pancreas	
  Inflamma2on	
  
6.  Liver	
  Inflamma2on	
  
7.  Pleurisy	
  
8.  Lung	
  Diseases	
  
Symptoms for
CVD
•  Primary Symptom
–  Chest pain
•  upper side
•  Right side
•  Other symptoms
–  Stomach ache
–  Dry mouth
•  Current diseases
–  Hypertension
–  Gout
–  Diabetes
•  Vital Signs
–  Blood pressure = normal
–  Heart rate = normal
64
1.  Diges2on-­‐Related	
  Causes	
  
2.  Cardiovascular	
  Problems	
  
3.  Viral	
  Infec2ons	
  
4.  Gallbladder	
  Infec2on	
  
5.  Pancreas	
  Inflamma2on	
  
6.  Liver	
  Inflamma2on	
  
7.  Pleurisy	
  
8.  Lung	
  Diseases	
  
Symptoms for
CVD
65	
  
Thesis Statement
Rich background knowledge from biomedical knowledge
bases and Wikipedia enables development of effective
methods for:
I.  Intent mining from health-related search queries in a
disease agnostic manner
II.  Efficient browsing of informative health information
shared on social media.
	
  
•  Intentional information seeking
–  Web search
•  Accidental information discovery
66
Information Acquisition
NASA’s Curiosity Rover on Mars
Accidentally bumping into (useful or
personal interest related) information
•  In many cases, the phenomenon of accidental information
discovery is facilitated by users prior actions – serendipity
•  Currently Twitter has thousands of health-centric accounts,
which are followed by millions of users to keep up with health
information
67
Health Information Acquisition
•  Everyday millions of tweets shared
•  Most of these tweets are highly personal
and contextual
•  Only around 12% posts are informative
•  User has to manually identify informative
tweets
68
Research Problem: How to automate
the identification of signals (informative
tweets) from noise (Twitter stream)
Information Overload on Twitter
•  Informativeness of a tweet
depends upon reader’s
–  Intent
–  Knowledge about the information in the
tweet or novelty in the information
–  Interest in the subject
–  Who is the author (expert in a domain,
personal connection)
69
Informativeness of a Tweet is Subjective
Objectively what makes a tweet informative?
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativeness Analysis
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativeness Analysis
Rule-based Filters Dataset
Experiment dataset Diabetes 40,000
Language English 29,034
URL Yes 17,422
Duplicate tweet 13,573
Minimum length
Minimum number of words = 5
and characters = 80
10,927
Max spelling mistakes 2 10,176
URL filtering
- Remove broken/not working URLs
- Duplicate URLs
8,273
Min URL PageRank 5 6,374
Naïve Bayes classifier
Rule-based
Filtering
Supervised
Classification
Tweets
Informative
Tweets
Experiments: Informativeness Analysis
Rule-based Filters Dataset
Experiment dataset Diabetes 40,000
Language English 29,034
URL Yes 17,422
Duplicate tweet 13,573
Minimum length
Minimum number of words = 5
and characters = 80
10,927
Max spelling mistakes 2 10,176
URL filtering
- Remove broken/not working URLs
- Duplicate URLs
8,273
Min URL PageRank 5 6,374
Supervised Classification Features
Bag-of-words Unigrams, bigrams
Text Features
•  Message length
•  Percentage of words, special characters
•  Part of speech tags
Author features
•  Social connectivity (Number of follow-followers)
•  Activity level (Number of tweets)
•  Author credibility/influence (Klout score)
Popularity features
Number of tweets, retweets, Facebook share, like,
comments, recommendations Google plus, LinkedIn
shares
Reliability feature URL PageRank
73
•  Randomly selected 40k tweets related to diabetes
•  Gold standard dataset
–  Randomly selected 3000 tweets
–  Annotation: 3 annotators independently rate the tweet
with informative score (1-4) (low to high)
–  Informative scores (1-4) then transformed into binary
scores
–  Label distribution: Informative: 33.6% non-informative:
66.4%
Experiments: Gold Standard Dataset
Approach
Sample
space
Sample space
Reduction
Rule-based filtering 6,374 84.25%
74
Evaluation: Supervised Classification (NB)
Features Precision
Tweet 66.20
Tweet + URL Title 68.72
Tweet + URL Title + URL Content 74.67
Tweet + URL Title + URL Content + Tweet Length 74.92
Tweet + URL Title + URL Content + Tweet Length +
Number of words
75.79
(Tweet + URL Title + URL Content + Tweet Length +
Number of words + Special chars) => FT1
76.83
FT1 + POS tags 77.23
FT1 + POS tags + PageRank 80.63
FT1 + POS tags + PageRank + social share 80.66
FT1 + POS tags + PageRank + social share + Author
Features
80.93
75
Hadoop-MapReduce
Framework
Informativeness
Analysis
Semantic
Categorization
Soni, S. 2015. Domain specific document retrieval framework on near real-time
social health data. Thesis, Wright State University
76
Search and Explore
X Controls Cancer
X = diet, treatment, exercise
(Pattern-based Approach
leveraging domain
semantics)
Top Health News
Faceted search (based on intent
classification algorithm)
Learn about disease
Source: Mayo Clinic
Search &
Explore
Top Health
News
Tweet
Traffic
Learn about
Disease
Home
Tweet
Traffic
Other work
77
78
Desktop
Mobile
Mobile
usage
took
Over
Comparative Analysis of Expressions of Search
Intents From Personal Computers and Smart Devices
79
Twitris: Social Media Analytics Platform
•  Core component of around $6+ million research funding
(NFS, NIH, AFRL)
•  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million)
–  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression
• 
•  Air Force Research Lab (AFRL)
–  Geo-Social mash-up for situational awareness in a disaster response situation
•  Funded project: 2010-2011, Real-time Twitris
• 
–  Social media analysis for situational awareness (Funded: 2011-2012)
• 
–  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC)
•  Funded project: Summer 2010, Summer 2011
•  Mayo Clinic Meritorious Award
–  Healthcare trend surveillance using social networks and health search queries
(funded 2013)
–  What makes a health-related tweet informative (funded 2014)
Research Grants and Proposals
80
•  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million)
–  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression
• 
•  Air Force Research Lab (AFRL)
–  Geo-Social mash-up for situational awareness in a disaster response situation
•  Funded project: 2010-2011, Real-time Twitris
• 
–  Social media analysis for situational awareness (Funded: 2011-2012)
• 
–  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC)
•  Funded project: Summer 2010, Summer 2011
•  Mayo Clinic Meritorious Award
–  Healthcare trend surveillance using social networks and health search queries
(funded 2013)
–  What makes a health-related tweet informative (funded 2014)
Research Grants and Proposals
81
82	
  
Conclusion
Search Intent Mining Health Search Intent Mining
83	
  
Conclusion
Health Search Intent Mining
Identified consumer-
oriented intent classes
Multi-label Classification
Problem (L=14)
Supervised ML Knowledge-driven Approach
Semantics-based
Intent Classification
-  Based on UMLS
semantic types and
concepts
-  Advanced text analytics
-  Consumer Health
Vocabulary
Consumer Health
Vocabulary
Generation
-  Leveraged
Knowledge from
Wikipedia
-  Maps CHV terms to
medical terms
84	
  
Conclusion
Knowledge Driven Approach for Health Search Intent Mining
Concept
Identification
-  UMLS MetaMap
-  Advanced text
analytics
-  Consumer Health
Vocabulary
Personalized eHealth Interventions
85	
  
Conclusion
Information overload
on Twitter
Subjectivity
Adapted search intent mining algorithm to
enable efficient browsing of the health
information on Social Health Signals
	
  
Objectively what makes a tweet informative?
Publications
•  Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer
Health Information Portal A Jadhav et al. AMIA Annual Symposium 2014
•  Comparative Analysis of Online Health Queries Originating From Personal Computers and
Smart Devices on a Consumer Health Information Portal A Jadhav et al. Journal of
Medical Internet Research JMIR (Impact factor 4.7)
•  Evaluating the Process of Online Health Information Searching: A Qualitative Approach to
Exploring Consumer Perspectives A Fiksdal, A Kumbamu, A Jadhav et al. Journal of
Medical Internet Research JMIR (Impact factor 4.7)
•  Online Information Seeking for Cardiovascular Diseases: A Case Study from Mayo Clinic
A Jadhav et al. 25th European Medical Informatics Conference (MIE 2014)
•  Empowering Personalized Medicine with Big Data and Semantic Web Technology:
Promises, Challenges, Pitfalls, and Use Cases M Panahiazar, V Taslimi, A Jadhav et al.
IEEE International Conference on Big Data (IEEE BigData 2014)
•  Comparative Analysis of Online Health Information Search by Device Type A Jadhav et
al. AMIA TBI/CRI 2014
•  An Analysis of Mayo Clinic Search Query Logs for Cardiovascular Diseases A Jadhav et
al. AMIA Annual Symposium 2014
•  What Information about Cardiovascular Diseases do People Search Online? A Jadhav et
al. 25th European Medical Informatics Conference (MIE 2014)
86	
  
Publications
87	
  
•  Twitris- a System for Collective Social Intelligence A Sheth, A Jadhav et al., Springer,
Encyclopedia of Social Network Analysis and Mining (ESNAM), 2014
•  Twitris: Socially Influenced Browsing A Jadhav et al. Semantic Web Challenge,
International Semantic Web Conference ISWC 2009
•  Twitris 2.0: Semantically Empowered System for Understanding Perceptions From
Social Data A Jadhav et al. Semantic Web Challenge, International Semantic Web
Conference ISWC 2010
•  Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and
Experiences M Nagarajan, K Gomadam, A Sheth, A Ranabahu, R Mutharaju A Jadhav
Web Information Systems Engineering (WISE 2009)
•  Understanding Events Through Analysis Of Social Media A Sheth, H Purohit, A Jadhav,
et al., Technical Report, Kno.e.sis Center, 2010
•  Twitris+: Social Media Analytics Platform for Effective Coordination A. Smith, A.
Sheth, A. Jadhav, et al. NSF SoCS Symposium, 2012
•  Patent on Context-Aware Information Recommendation, filed in January 2013
–  Patent filled based on HP summer 2011 internship work
–  Ashutosh Jadhav, Hamid Motahari, Susan Spence, Claudio Bartolini
•  Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for
web-query classification. ACM Transactions on Information Systems (TOIS) 24, 3,320-352.
•  Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification.
In Proceedings of the 29th annual international ACM SIGIR conference on Research and
development in information retrieval. ACM, 131-138.
•  Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. 2010. Clustering query refinements by user
intent. In Proceedings of the 19th international conference on World wide web. ACM, 841-850.
•  Radlinski, F., Szummer, M., and Craswell, N. 2010. Inferring query intent from reformulations and
clicks. In Proceedings of the 19th international conference on World wide web. ACM, 1171-1172.
•  Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of
the 13th international conference on World Wide Web. ACM, 13-19.
•  Nanda, A., Omanwar, R., and Deshpande, B. 2014. Implicitly learning a user interest profile for
personalization of web search using collaborative filtering. In Web Intelligence (WI) and Intelligent
Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE
•  Soni, S. 2015. Domain specic document retrieval framework on near real-time social health data.
Thesis, Wright State University
•  Naaman, M., Boase, J., and Lai, C.-H. 2010. Is it really about me?: message content in social
awareness streams. In Proceedings of the 2010 ACM conference on Computer supported
cooperative work. ACM, 189-192.
•  White, R. W. and Horvitz, E. 2014. From health search to healthcare: explorations of intention
and utilization via query logs and user surveys. JAMIA
•  Celikyilmaz, A., Hakkani-T ur, D., and T ur, G. 2011. Leveraging web query logs to learn user
intent via bayesian discrete latent variable model. In Proceedings of ICML.
•  Amit Sheth 15 years of Semantic Search and Ontology-enabled Semantic Applications 88	
  
References
•  Sheth A, Avant D, Bertram C, inventors; Taalee, Inc., assignee. System and method for creating a
semantic web and its applications in browsing, searching, profiling, personalization and
advertising. United States patent US 6,311,194. 2001 Oct 30.
•  Lu, C.-J. 2012. Accidental discovery of information on the user-defined social web: A mixed-
method study. Ph.D. thesis, University of Pittsburgh.
•  Li, X. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of the
48th Annual Meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, 1337-1345.
•  Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., and Zeng- Treitler, Q.
2008. Consumer health concepts that do not map to the umls: where do they fit? Journal of the
American Medical Informatics Association 15, 4, 496-505.
•  Hu, J., Wang, G., Lochovsky, F., Sun, J.-t., and Chen, Z. 2009. Understanding user's query intent
with wikipedia. In Proceedings of the 18th international conference on World wide web. ACM,
•  Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., and Zheng, Q. 2012. Mining query subtopics from
search log data. In Proceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval. ACM, 305-314
•  Fox, S. 2014. Pew internet & american life project report. 2013. Pew Internet: Health URL: http://
www. pewinternet. org/fact-sheets/health-fact-sheet/
•  Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust
classification of rare queries using web knowledge. In Proceedings of the 30th annual
international ACM SIGIR conference on Research and development in information retrieval. ACM
Broder, A. 2002. A taxonomy of web search. In ACM Sigir forum. Vol. 36. ACM, 3-10.
•  Baeza-Yates, R., Calderon-Benavides, L., and Gonzalez-Caro, C. 2006. The intention behind
web queries. In String processing and information retrieval. Springer, 98-109. 89	
  
References
90
Acknowledgement
91	
  
Now…
Then…
Now…
92	
  
Acknowledgement
 
Thank you J
Disclaimer: All other trademarks, logos and images used in this
presentation belong to their respective owners.

Weitere ähnliche Inhalte

Was ist angesagt?

Social Networks and Collaborative Platforms for Data Sharing in Radiology
Social Networks and Collaborative Platforms for Data Sharing in RadiologySocial Networks and Collaborative Platforms for Data Sharing in Radiology
Social Networks and Collaborative Platforms for Data Sharing in RadiologyErik R. Ranschaert, MD, PhD
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...Maged N. Kamel Boulos
 
Security and ethical issues of mobile device technology
Security and ethical issues of mobile device technologySecurity and ethical issues of mobile device technology
Security and ethical issues of mobile device technologyErik R. Ranschaert, MD, PhD
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in HealthcareAnders Quitzau
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?Philip Bourne
 
Clinical Research Informatics Year-in-Review 2021
Clinical Research Informatics Year-in-Review 2021Clinical Research Informatics Year-in-Review 2021
Clinical Research Informatics Year-in-Review 2021Peter Embi
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIHPhilip Bourne
 
Developing a World Leading Technology Enabled Health Programme of Research
Developing a World Leading Technology Enabled Health Programme of ResearchDeveloping a World Leading Technology Enabled Health Programme of Research
Developing a World Leading Technology Enabled Health Programme of ResearchMaged N. Kamel Boulos
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedPhilip Bourne
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imagingKyuhwan Jung
 
5 Reasons Why Radiology Needs Artificial Intelligence
5 Reasons Why Radiology Needs Artificial Intelligence5 Reasons Why Radiology Needs Artificial Intelligence
5 Reasons Why Radiology Needs Artificial IntelligenceSimon Harris
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterprisePhilip Bourne
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health careAravindharamanan S
 

Was ist angesagt? (20)

What's in WhatsApp for Radiologists?
What's in WhatsApp for Radiologists?What's in WhatsApp for Radiologists?
What's in WhatsApp for Radiologists?
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
Social Networks and Collaborative Platforms for Data Sharing in Radiology
Social Networks and Collaborative Platforms for Data Sharing in RadiologySocial Networks and Collaborative Platforms for Data Sharing in Radiology
Social Networks and Collaborative Platforms for Data Sharing in Radiology
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...
 
Security and ethical issues of mobile device technology
Security and ethical issues of mobile device technologySecurity and ethical issues of mobile device technology
Security and ethical issues of mobile device technology
 
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in Healthcare
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?
 
Clinical Research Informatics Year-in-Review 2021
Clinical Research Informatics Year-in-Review 2021Clinical Research Informatics Year-in-Review 2021
Clinical Research Informatics Year-in-Review 2021
 
The Vision for Data @ the NIH
The Vision for Data @ the NIHThe Vision for Data @ the NIH
The Vision for Data @ the NIH
 
Developing a World Leading Technology Enabled Health Programme of Research
Developing a World Leading Technology Enabled Health Programme of ResearchDeveloping a World Leading Technology Enabled Health Programme of Research
Developing a World Leading Technology Enabled Health Programme of Research
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
 
5 Reasons Why Radiology Needs Artificial Intelligence
5 Reasons Why Radiology Needs Artificial Intelligence5 Reasons Why Radiology Needs Artificial Intelligence
5 Reasons Why Radiology Needs Artificial Intelligence
 
Identification of emerging technologies via a systematic search strategy deve...
Identification of emerging technologies via a systematic search strategy deve...Identification of emerging technologies via a systematic search strategy deve...
Identification of emerging technologies via a systematic search strategy deve...
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital Enterprise
 
Ebola response in Liberia: A step towards real-time epidemic science
Ebola response in Liberia: A step towards real-time epidemic scienceEbola response in Liberia: A step towards real-time epidemic science
Ebola response in Liberia: A step towards real-time epidemic science
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health care
 

Andere mochten auch

Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Artificial Intelligence Institute at UofSC
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social MediaMeena Nagarajan
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 

Andere mochten auch (20)

Automatic Emotion Identification from Text
Automatic Emotion Identification from TextAutomatic Emotion Identification from Text
Automatic Emotion Identification from Text
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
 
Knowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information ExtractionKnowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information Extraction
 
Mining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated ContentMining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated Content
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 

Ähnlich wie Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining

An introduction to conducting a systematic literature review for social scien...
An introduction to conducting a systematic literature review for social scien...An introduction to conducting a systematic literature review for social scien...
An introduction to conducting a systematic literature review for social scien...rosie.dunne
 
1 Research methdology (1).ppt
1 Research methdology (1).ppt1 Research methdology (1).ppt
1 Research methdology (1).pptestelaabera
 
Innovation in public health education at Ramathibodi medical school 2013.12.12
Innovation in public health education at Ramathibodi medical school 2013.12.12Innovation in public health education at Ramathibodi medical school 2013.12.12
Innovation in public health education at Ramathibodi medical school 2013.12.12Borwornsom Leerapan
 
Evaluation methods in heathcare systems
Evaluation methods in heathcare systemsEvaluation methods in heathcare systems
Evaluation methods in heathcare systemsMarsa Gholamzadeh
 
Systematic Review: Beginner's Guide
Systematic Review: Beginner's Guide Systematic Review: Beginner's Guide
Systematic Review: Beginner's Guide Saee Deshpamde
 
Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Marion Sills
 
Basic.Method.pptx
Basic.Method.pptxBasic.Method.pptx
Basic.Method.pptxSundosHamza
 
Evaluating the priority setting processes used across the Cochrane Collaboration
Evaluating the priority setting processes used across the Cochrane CollaborationEvaluating the priority setting processes used across the Cochrane Collaboration
Evaluating the priority setting processes used across the Cochrane Collaborationmonalisa2n
 
Priority Setting Presentation Freiburg
Priority Setting Presentation FreiburgPriority Setting Presentation Freiburg
Priority Setting Presentation FreiburgMona Nasser
 
Clinical queries version 5
Clinical queries version 5Clinical queries version 5
Clinical queries version 5CherylLouise
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic reviewDrNidhiPruthiShukla
 
Comp10 unit1a lecture_slides
Comp10 unit1a lecture_slidesComp10 unit1a lecture_slides
Comp10 unit1a lecture_slidesCMDLMS
 
6C Lloyd et al. A database of patient experience, questions, concerns and pre...
6C Lloyd et al. A database of patient experience, questions, concerns and pre...6C Lloyd et al. A database of patient experience, questions, concerns and pre...
6C Lloyd et al. A database of patient experience, questions, concerns and pre...IKT-Norge
 
Importance, definition and process of market research
Importance, definition and process of market researchImportance, definition and process of market research
Importance, definition and process of market researchInfoQ - GMO Research
 
Multi criteria value-maximization methods for the prioritization of r&d inves...
Multi criteria value-maximization methods for the prioritization of r&d inves...Multi criteria value-maximization methods for the prioritization of r&d inves...
Multi criteria value-maximization methods for the prioritization of r&d inves...Dimitrios Gouglas
 
Expert searching - what are we missing? Sarah Hayman
Expert searching - what are we missing?  Sarah HaymanExpert searching - what are we missing?  Sarah Hayman
Expert searching - what are we missing? Sarah Haymanhealthlibaust2012
 

Ähnlich wie Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining (20)

An introduction to conducting a systematic literature review for social scien...
An introduction to conducting a systematic literature review for social scien...An introduction to conducting a systematic literature review for social scien...
An introduction to conducting a systematic literature review for social scien...
 
1 Research methdology (1).ppt
1 Research methdology (1).ppt1 Research methdology (1).ppt
1 Research methdology (1).ppt
 
HM404 Ab120916 ch08
HM404 Ab120916 ch08HM404 Ab120916 ch08
HM404 Ab120916 ch08
 
Innovation in public health education at Ramathibodi medical school 2013.12.12
Innovation in public health education at Ramathibodi medical school 2013.12.12Innovation in public health education at Ramathibodi medical school 2013.12.12
Innovation in public health education at Ramathibodi medical school 2013.12.12
 
Evaluation methods in heathcare systems
Evaluation methods in heathcare systemsEvaluation methods in heathcare systems
Evaluation methods in heathcare systems
 
Systematic Review: Beginner's Guide
Systematic Review: Beginner's Guide Systematic Review: Beginner's Guide
Systematic Review: Beginner's Guide
 
Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806
 
Systematic review
Systematic reviewSystematic review
Systematic review
 
Basic.Method.pptx
Basic.Method.pptxBasic.Method.pptx
Basic.Method.pptx
 
Evaluating the priority setting processes used across the Cochrane Collaboration
Evaluating the priority setting processes used across the Cochrane CollaborationEvaluating the priority setting processes used across the Cochrane Collaboration
Evaluating the priority setting processes used across the Cochrane Collaboration
 
Priority Setting Presentation Freiburg
Priority Setting Presentation FreiburgPriority Setting Presentation Freiburg
Priority Setting Presentation Freiburg
 
Clinical queries version 5
Clinical queries version 5Clinical queries version 5
Clinical queries version 5
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic review
 
Comp10 unit1a lecture_slides
Comp10 unit1a lecture_slidesComp10 unit1a lecture_slides
Comp10 unit1a lecture_slides
 
6C Lloyd et al. A database of patient experience, questions, concerns and pre...
6C Lloyd et al. A database of patient experience, questions, concerns and pre...6C Lloyd et al. A database of patient experience, questions, concerns and pre...
6C Lloyd et al. A database of patient experience, questions, concerns and pre...
 
Embase search with PICO
Embase search with PICOEmbase search with PICO
Embase search with PICO
 
Importance, definition and process of market research
Importance, definition and process of market researchImportance, definition and process of market research
Importance, definition and process of market research
 
Multi criteria value-maximization methods for the prioritization of r&d inves...
Multi criteria value-maximization methods for the prioritization of r&d inves...Multi criteria value-maximization methods for the prioritization of r&d inves...
Multi criteria value-maximization methods for the prioritization of r&d inves...
 
Introduction
IntroductionIntroduction
Introduction
 
Expert searching - what are we missing? Sarah Hayman
Expert searching - what are we missing?  Sarah HaymanExpert searching - what are we missing?  Sarah Hayman
Expert searching - what are we missing? Sarah Hayman
 

Kürzlich hochgeladen

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining

  • 1. Committee: Amit Sheth (Advisor) T. K. Prasad Michael Raymer Jyotishman Pathak (Cornell University) Ph.D. Dissertation Defense Knowledge Driven Search Intent Mining Ashutosh Jadhav April 18, 2016
  • 2. 2   3.5 billion Web searches per day
  • 3. 3   3.5 billion Web searches per day One of the key aspects in building an intelligent search engine is to understand users’ search intents
  • 4. 4   Search Intent Mining ! Applications
  • 5. Search Result Ranking 5   Search Result Diversification Search Personalization Search Ads
  • 6. Web Search Intent 6   Search intent is a significant object/topic that represents abstraction of users’ information needs. Search Goals* Search Topics Why WhatWhat
  • 7. 7   Search Intent Mining Search Goals Search Topics Session dataClick-through Query log Manual Unsupervised Supervised Ontology-based Knowledge driven My Work Related Work Broder 2002 Beeferma 2003 Rose and Levinson 2004 Baeza-Yates 2006 Hu et al. 2009 Sadikov 2010 Nanda 2014 Ustinovskiy 2013 White 2010 Joachims 2002 Lee 2005 Fujita 2010 Hu 2012 Broder 2007 Radlinski 2010 Celikyilmaz 2011 Shen 2006 Biomedical KB – UMLS Crowd-sourced KB – Wikipedia Dictionaries – Hunspell, OpenMedspell Techniques
  • 8. 8   Search is shifting toward understanding intent and serving objects -Li et al., ACL, 2010
  • 10. 10   Health MovieSports TechnologyPhysics Health Diseases Symptoms Causes Medications Treatments Prevention
  • 11.     Web Search for Health Information Among all topics available on the Internet, health is one of the most important in terms of impact on the user 11  
  • 12. •  Major Challenges Ø  Consumers’ lack of medical knowledge to formulate health search queries Ø  Search engines’ failure to understand users’ health search intents 12   Challenges in Health Information Search •  Health information search is a “trial-and error” process.  
  • 13. •  Health search intent mining applications: –  Personalized health information interventions –  To get better understanding of consumers’ health information needs –  Targeted advertisements Motivation: Real-world Applications 13   Research Problem: Domain specific search intent mining
  • 14. 14   Thesis Statement Rich background knowledge from biomedical knowledge bases and Wikipedia enables development of effective methods for: I.  Intent mining from health-related search queries in disease agnostic manner II.  Efficient browsing of informative health information shared on social media.  
  • 15. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) Health Search Intent 15  
  • 16. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) Health Search Intent 16   Three focus groups Study questions: •  Motivation for using internet for health information seeking •  What do they search? (search intent) •  How do they search? •  What are the challenges in the search  
  • 17. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 17   Selection criteria: •  Google PageRank, Alexa ranking, •  Medical Library Association’s ranking (CAPHIS - Consumer and Patient Health Information Section) Selected websites: Mayo Clinic, WebMD, MedlinePlus, CDC, HealthFinder.gov, and Familydoctor.org.
  • 18. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 18  
  • 19. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 19   Intent Classes Intent Classes 1 Symptoms 8 Living with 2 Causes 9 Prevention 3 Risks & Complications 10 Side effects 4 Drugs and Medications 11 Medical devices 5 Treatments 12 Diseases and conditions 6 Tests and Diagnosis 13 Age-group References 7 Food and Diet 14 Vital signs
  • 20. •  Focus: Consumer-oriented health search intent •  Challenge: No standardized list of consumer-oriented health intent classes •  Approach: –  Qualitative study (published in JMIR, impact factor 4.7) –  Health categories on popular health websites –  Review of online health information seeking literature –  Empirical data analysis The intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts Health Search Intent 20   Intent Classes Intent Classes 1 Symptoms 8 Living with 2 Causes 9 Prevention 3 Risks & Complications 10 Side effects 4 Drugs and Medications 11 Medical devices 5 Treatments 12 Diseases and conditions 6 Tests and Diagnosis 13 Age-group References 7 Food and Diet 14 Vital signs
  • 21. •  Allows the instances to be associated with more than one class •  Problem transformation methods (fit data to algorithm) –  Transform the multi-label classification problem either into one or more single-label classification problems. –  e.g., Binary Relevance, Label Power, and RAKEL-RAndom k-LabELsets •  Algorithm adaptation methods (fit algorithm to data) –  Extend specific learning algorithms in order to handle multi-label data directly. –  e.g., Tree-based boosting - AdaBoost.MR, ML-kNN, and Rank-SVM 21   Multi-label Classification Both these methods follow underlying principles of the supervised learning approach and depend on training data.  
  • 22. •  Manual, time consuming and labor intensive process •  May require domain experts •  Limited coverage –  Training data should be a representative sample of the dataset –  Very difficult to create a training dataset that can cover all aspects (discriminative features) of the dataset •  Generalization problem –  Poor performance on unseen data Challenges with Training Data Creation 22   These challenges get amplified for multi-label classification problems  
  • 23. In the context of health search intent mining problem •  Training data for 14 intent classes •  Need domain experts to label dataset Supervised Classification Limitations 23   Domain constraint: A classifier trained for one disease may not work for other diseases These challenges make supervised learning- based approaches infeasible for our problem  
  • 24. 24   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases
  • 25. 25   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases Ontology Timeframe: early 2000 First patent on Semantic Web More information at blog
  • 26. 26   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases Ontology
  • 27. 27   Knowledge Driven Approach Machine Processable Knowledge Ontologies Taxonomies Dictionaries Knowledge- bases
  • 28. Unified Medical Language System   •  UMLS (Unified Medical Language System) –  Collection of over 100 controlled vocabularies such as MeSH, SNOMED_CT, NCI, and RxNorm Biomedical Knowledge Base 28   Metathesaurus Collection of concepts Semantic Network Semantic Types and Semantic Relationships SPECIALIST Lexicon Biomedical terms and their variants
  • 29. •  Concept identification consists of two primary tasks: –  Concept recognition and concept mapping –  Example : what are the medications for stomach pain? Concepts: medication, stomach pain Challenges •  Lexical or orthographic variants e.g., (diet, dieting), (ICD9, ICD-9) •  Misspelling, e.g., (pneumonia, neumonia) •  Synonyms, e.g., (heart attack, myocardial infarction) •  Abbreviations, e.g., (myocardial infarction, MI) •  Identifying concept boundary e.g., (pain in stomach, stomach pain) •  Contextual meanings, e.g., (discharge from hospital, discharge from wound) Concept Identification 29  
  • 30. •  Medical concept identification tools –  UMLS MetaMap, cTAKES, MedLEE, NCBO Annotator •  UMLS MetaMap –  Identifies ULMS Metathesaurus concepts from text –  Semantic Type (e.g., disease or syndrome) –  UMLS Concept (e.g., blood pressure and heart rate) •  Example (UMLS Concept) [Sematic Type] –  Phrase query: red wine heart attack •  Red wine (Red wine) [Food] •  Heart Attack (Myocardial Infarction) [Disease or Syndrome] 30   Concept Identification
  • 31. •  Phrase query: water on the brain –  Water (Drinking Water) [Substance] –  Brain (Brain) [Body Part, Organ, or Organ Component] •  Actual Mapping should be –  Water on the brain (Hydrocephalus) [Disease or Syndrome] Concept Identification Challenges 31  
  • 32. Concept Identification Approach 32   •  Advanced text analytics –  Word Sense Disambiguation (WSD) •  Process of identifying the meaning of a term in context •  With the WSD advancement, concepts are identified by considering the surrounding text –  Maximal phase detection •  Process each input record as a single phrase in order to identify more complex Metathesaurus terms •  Consumer Health Vocabulary (CHV)
  • 33. •  Consumer Health Vocabulary (CHV) –  Maps terms used by layman to medical terms –  E.g. hair loss => Alopecia •  Problem: CHV in UMLS is incomplete •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome] Consumer Health Vocabulary 33  
  • 34. •  Consumer Health Vocabulary (CHV) –  Maps terms used by layman to medical terms –  E.g. hair loss => Alopecia •  Problem: CHV in UMLS is incomplete •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome] Consumer Health Vocabulary 34   Major challenge for health search intent mining problem
  • 35. •  Traditional approach –  Identification of consumer-oriented terms from Medline search log, PatientsLikeMe forum data –  Manual review by healthcare professionals Approach: leverage knowledge from Wikipedia •  One of the most-used online health resources •  Continuously updated with emerging health terms •  Links consumer-oriented terms with health professionals terms using semantic relationships Consumer Health Vocabulary Generation 35  
  • 36. •  Traditional approach –  Identification of consumer-oriented terms from Medline search log, PatientsLikeMe forum data –  Manual review by healthcare professionals Approach: leverage knowledge from Wikipedia •  One of the most-used online health resources •  Continuously updated with emerging health terms •  Links consumer-oriented terms with health professionals terms using semantic relationships Consumer Health Vocabulary Generation 36  
  • 37. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 37  
  • 38. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 38  
  • 39. •  Wikipedia: Crowd sourced encyclopedia Consumer Health Vocabulary Generation 39   Health-related Wikipedia articles Health Category Candidate subcategories Articles tagged with candidate subcategories Step 1: Identification of health-related Wikipedia articles
  • 40. Snippet 2: Knee effusion or swelling of the knee (colloquially known as water on the knee) occurs when excess synovial fluid accumulates in or around the knee joint. Snippet 1: Hair loss, also known as alopecia or baldness, refers to a loss of hair from the head or body. 40   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs
  • 41. 41   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Snippet 2: Knee effusion or swelling of the knee (colloquially known as water on the knee) occurs when excess synovial fluid accumulates in or around the knee joint. Snippet 1: Hair loss, also known as alopecia or baldness, refers to a loss of hair from the head or body.
  • 42. 42   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Pairs Terms Semantic Relationship Terms 1 hair loss also known as alopecia 2 hair loss also known as baldness 3 knee effusion colloquially known as water on the knee 4 swelling of the knee colloquially known as water on the knee 5 knee effusion same as swelling of the knee
  • 43. 43   Consumer Health Vocabulary Generation Step 2: Extraction of candidate pairs Wikipedia Patterns also called commonly called colloquially known as also known as commonly known as sometimes called also referred to as commonly termed sometimes known as also termed previously known as sometimes termed commonly referred to as colloquially referred to as sometimes referred to as Pattern-based information extractor
  • 44. 44   Consumer Health Vocabulary Generation Step 3: Identification of CHV and medical terms from the candidate pairs Map terms from the candidate pairs to UMLS Metathesaurus using MetaMap •  Scenario 1: -  Both terms are present in the UMLS Metathesaurus -  e.g., {hair loss, alopecia} •  Scenario 2: -  Both terms are not present in the UMLS Metathesaurus -  e.g., {hospital trust, acute trust} •  Scenario 3: -  Only one term is present in the UMLS Metathesaurus -  e.g., {knee effusion, water on the knee}
  • 45. •  Data: –  Cardiovascular disease (CVD) related search queries –  Limited to the United States •  Data timeframe: –  September 2011 to August 2013 •  Data collection tool: –  IBM NetInsight On Demand (Web Analytics tool) •  Dataset size: –  10.4 million CVD related search queries –  Significantly large dataset for a single class of diseases. 45   Dataset
  • 46. •  Preprocessing –  Stop word removal –  Misspelling correction (using Hunspell spell checker) •  Dictionaries: Hunspell dictionary, and its medical version, OpenMedSpell –  Replace all CHV terms from the search queries with medical terms •  UMLS MetaMap –  Usage challenge: Significantly slow for millions of search queries Data Processing 46  
  • 47. •  Preprocessing –  Stop word removal –  Misspelling correction (using Hunspell spell checker) •  Dictionaries: Hunspell dictionary, and its medical version, OpenMedSpell –  Replace all CHV terms from the search queries with medical terms •  UMLS MetaMap –  Usage challenge: Significantly slow for millions of search queries Data Processing 47   Solution: Developed a scalable MetaMap implementation using a Hadoop-MapReduce framework
  • 48. •  Gold standard dataset –  Two domain experts annotated randomly selected search queries by labeling one search query with zero or more intent classes –  Gold standard dataset is further divided into training and testing •  Evaluation Matrics –  Macro Average Precision Recall –  Average of the precision and recall of the classification algorithm on different classes –  To identify classification performance at class-level 48   Evaluation
  • 49. •  Search Query Annotation –  UMLS concepts and semantic types •  Classification Rules 49   Classification: Annotation and Rules Intent Class Classification Rule Examples Drugs and Medications •  {ST ∪ SC ∪ KW} SC* •  ST: ORCH|PHSU, CLND, PHSU •  SC: medication, medicine, drugs, dose, dosage, tablet, pill •  KW: meds •  (Without) SC*: alcohol, caffeine, fruit, prevent •  medications for pulmonary hypertension •  ibuprofen heart rate •  dextromethorphan blood pressure Abbreviations: ORCH - Organic Chemical PHSU - Pharmacologic Substance CLND - Clinical Drug
  • 50. 50   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  alcohol heart disease •  meds for acid reflux
  • 51. 51   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  alcohol heart disease •  meds for acid reflux
  • 52. 52   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified Wrongly classified •  ibuprofen heart rate •  dextromethorphan blood pressure •  medications for pulmonary hypertension •  meds for acid reflux •  alcohol heart disease
  • 53. 53   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary For Drug and medication Intent Class Correctly classified •  ibuprofen heart rate •  meds for acid reflux •  alcohol heart disease •  medications for pulmonary hypertension •  dextromethorphan blood pressure
  • 54. 54   Classification : Evaluation Results Rules Precision Recall F1 Score ST (baseline approach) 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary •  Phrase query: water on the brain –  Water (Drinking Water) [Substance] –  Brain (Brain) [Body Part, Organ, or Organ Component] •  Actual Mapping should be –  Water on the brain (Hydrocephalus) [Disease or Syndrome] •  Advanced Text Analytics –  Word sense disambiguation, maximal phrase detection, CHV from UMLS
  • 55. 55   Classification : Evaluation Results Rules Precision Recall F1 Score ST 0.5432 0.6203 0.5791 ST+SC 0.6534 0.6822 0.6674 ST+SC+KW 0.6722 0.6923 0.6821 ST+SC+KW-ST* 0.7383 0.7344 0.7363 ST+SC+KW-ST*-SC* 0.7601 0.7930 0.7762 ST+SC+KW-ST*-SC*+AdvTA 0.8539 0.8382 0.8459 ST+SC+KW-ST*-SC*+AdvTA+CHV 0.8842 0.8607 0.8723 ST = Semantic type SC = Semantic (UMLS) concepts KW = keyword AdvTA = Advanced Text Analytic CHV = Consumer Health Vocabulary •  Generating CHV from Wikipedia •  Example: water on the knee Water thick-knee (Burhinus vermiculatus) [Bird] •  Actual Mapping should be –  Water on the knee(Knee effusion ) [Disease or Syndrome]
  • 56. •  Macro Average –  Precision:0.8842, Recall: 0.8607 and F-Score: 0.8723 56   Classification : Evaluation Results To check the performance of the classification approach for individual intent classes
  • 57. No Intent Classes Total Queries Percentage Distribution 1 Diseases 4,232,398 40.66 2 Vital signs 3,455,809 33.20 3 Symptoms 1,422,826 13.67 4 Living with 1,178,756 11.32 5 Treatments 955,701 9.18 6 Food and Diet 779,949 7.49 7 Med Devices 665,484 6.39 8 Drugs and Medications 603,905 5.80 9 Causes 599,895 5.76 10 Tests & Diagnosis 344,747 3.31 11 Risks and Complication 277,294 2.66 12 Prevention 136,428 1.31 13 Age-group References 87,929 0.84 14 Side effects 25,655 0.25 Total 14,766,776 141.87 57   Classification: Results
  • 58. 8%   48%   40%   4%   0%   Distribution of search queries by number of intent classes in which they are classified 0 1 2 3 4 and more 58   Classification: Results
  • 59. Dataset Precision Recall F1-Score Cardiovascular Diseases 0.8842 0.8642 0.8723 Diabetes 0.9274 0.8964 0.9116 Cancer 0.8294 0.7635 0.7950 59   Classification: Results
  • 61. 61 •  Hello, For the past 10 hours I've been expierencing a semi sharp pain in my upper right chest just below my armpit. This pain appears anywhere from every two and a half minutes to ten or fifteen minutes. I also have some stomach ache and dry mouth. I monitor my blood pressure is averages 130/90 with a average heart rate of 80. My cardiologist has been treating me since 1 year for high colesterol, gout and hypertension with great success. Also I have diabetes and I am taking Metformin and mevacor. I have an appointment with my cardiologist after 2 weeks. However I am wondering should I go to ER? BTW I am 69 years old male. Scenario in Clinical Decision Support System Source: DailyStrength forum
  • 62. 62 Demographic Information dry mouth => Xerostomia Drugs and Medication Misspellings Diseases and Conditions Symptom Consumer Health Vocabulary expierencing => experiencing colesterol => cholesterol chest pain stomach ache Xerostomia (dry mouth) Age: 69 Gender: Male Metformin Mevacor Gout Hypertension Diabetes Blood pressure: 130/90 Heart rate: 80Vital Signs
  • 63. •  Primary Symptom –  Chest pain •  upper side •  Right side •  Other symptoms –  Stomach ache –  Dry mouth •  Current diseases –  Hypertension –  Gout –  Diabetes •  Vital Signs –  Blood pressure = normal –  Heart rate = normal 63 1.  Diges2on-­‐Related  Causes   2.  Cardiovascular  Problems   3.  Viral  Infec2ons   4.  Gallbladder  Infec2on   5.  Pancreas  Inflamma2on   6.  Liver  Inflamma2on   7.  Pleurisy   8.  Lung  Diseases   Symptoms for CVD
  • 64. •  Primary Symptom –  Chest pain •  upper side •  Right side •  Other symptoms –  Stomach ache –  Dry mouth •  Current diseases –  Hypertension –  Gout –  Diabetes •  Vital Signs –  Blood pressure = normal –  Heart rate = normal 64 1.  Diges2on-­‐Related  Causes   2.  Cardiovascular  Problems   3.  Viral  Infec2ons   4.  Gallbladder  Infec2on   5.  Pancreas  Inflamma2on   6.  Liver  Inflamma2on   7.  Pleurisy   8.  Lung  Diseases   Symptoms for CVD
  • 65. 65   Thesis Statement Rich background knowledge from biomedical knowledge bases and Wikipedia enables development of effective methods for: I.  Intent mining from health-related search queries in a disease agnostic manner II.  Efficient browsing of informative health information shared on social media.  
  • 66. •  Intentional information seeking –  Web search •  Accidental information discovery 66 Information Acquisition NASA’s Curiosity Rover on Mars Accidentally bumping into (useful or personal interest related) information
  • 67. •  In many cases, the phenomenon of accidental information discovery is facilitated by users prior actions – serendipity •  Currently Twitter has thousands of health-centric accounts, which are followed by millions of users to keep up with health information 67 Health Information Acquisition
  • 68. •  Everyday millions of tweets shared •  Most of these tweets are highly personal and contextual •  Only around 12% posts are informative •  User has to manually identify informative tweets 68 Research Problem: How to automate the identification of signals (informative tweets) from noise (Twitter stream) Information Overload on Twitter
  • 69. •  Informativeness of a tweet depends upon reader’s –  Intent –  Knowledge about the information in the tweet or novelty in the information –  Interest in the subject –  Who is the author (expert in a domain, personal connection) 69 Informativeness of a Tweet is Subjective Objectively what makes a tweet informative?
  • 71. Naïve Bayes classifier Rule-based Filtering Supervised Classification Tweets Informative Tweets Experiments: Informativeness Analysis Rule-based Filters Dataset Experiment dataset Diabetes 40,000 Language English 29,034 URL Yes 17,422 Duplicate tweet 13,573 Minimum length Minimum number of words = 5 and characters = 80 10,927 Max spelling mistakes 2 10,176 URL filtering - Remove broken/not working URLs - Duplicate URLs 8,273 Min URL PageRank 5 6,374
  • 72. Naïve Bayes classifier Rule-based Filtering Supervised Classification Tweets Informative Tweets Experiments: Informativeness Analysis Rule-based Filters Dataset Experiment dataset Diabetes 40,000 Language English 29,034 URL Yes 17,422 Duplicate tweet 13,573 Minimum length Minimum number of words = 5 and characters = 80 10,927 Max spelling mistakes 2 10,176 URL filtering - Remove broken/not working URLs - Duplicate URLs 8,273 Min URL PageRank 5 6,374 Supervised Classification Features Bag-of-words Unigrams, bigrams Text Features •  Message length •  Percentage of words, special characters •  Part of speech tags Author features •  Social connectivity (Number of follow-followers) •  Activity level (Number of tweets) •  Author credibility/influence (Klout score) Popularity features Number of tweets, retweets, Facebook share, like, comments, recommendations Google plus, LinkedIn shares Reliability feature URL PageRank
  • 73. 73 •  Randomly selected 40k tweets related to diabetes •  Gold standard dataset –  Randomly selected 3000 tweets –  Annotation: 3 annotators independently rate the tweet with informative score (1-4) (low to high) –  Informative scores (1-4) then transformed into binary scores –  Label distribution: Informative: 33.6% non-informative: 66.4% Experiments: Gold Standard Dataset Approach Sample space Sample space Reduction Rule-based filtering 6,374 84.25%
  • 74. 74 Evaluation: Supervised Classification (NB) Features Precision Tweet 66.20 Tweet + URL Title 68.72 Tweet + URL Title + URL Content 74.67 Tweet + URL Title + URL Content + Tweet Length 74.92 Tweet + URL Title + URL Content + Tweet Length + Number of words 75.79 (Tweet + URL Title + URL Content + Tweet Length + Number of words + Special chars) => FT1 76.83 FT1 + POS tags 77.23 FT1 + POS tags + PageRank 80.63 FT1 + POS tags + PageRank + social share 80.66 FT1 + POS tags + PageRank + social share + Author Features 80.93
  • 75. 75 Hadoop-MapReduce Framework Informativeness Analysis Semantic Categorization Soni, S. 2015. Domain specific document retrieval framework on near real-time social health data. Thesis, Wright State University
  • 76. 76 Search and Explore X Controls Cancer X = diet, treatment, exercise (Pattern-based Approach leveraging domain semantics) Top Health News Faceted search (based on intent classification algorithm) Learn about disease Source: Mayo Clinic Search & Explore Top Health News Tweet Traffic Learn about Disease Home Tweet Traffic
  • 78. 78 Desktop Mobile Mobile usage took Over Comparative Analysis of Expressions of Search Intents From Personal Computers and Smart Devices
  • 79. 79 Twitris: Social Media Analytics Platform •  Core component of around $6+ million research funding (NFS, NIH, AFRL)
  • 80. •  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million) –  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression •  •  Air Force Research Lab (AFRL) –  Geo-Social mash-up for situational awareness in a disaster response situation •  Funded project: 2010-2011, Real-time Twitris •  –  Social media analysis for situational awareness (Funded: 2011-2012) •  –  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC) •  Funded project: Summer 2010, Summer 2011 •  Mayo Clinic Meritorious Award –  Healthcare trend surveillance using social networks and health search queries (funded 2013) –  What makes a health-related tweet informative (funded 2014) Research Grants and Proposals 80
  • 81. •  NIH-R01 proposal (Mayo Clinic and Kno.e.sis, Wright State) ($2 Million) –  Modeling Social Behavior for Healthcare Utilization and Outcomes in Depression •  •  Air Force Research Lab (AFRL) –  Geo-Social mash-up for situational awareness in a disaster response situation •  Funded project: 2010-2011, Real-time Twitris •  –  Social media analysis for situational awareness (Funded: 2011-2012) •  –  WBI's Tec^Edge Innovation and Collaboration Center (Tec^Edge ICC) •  Funded project: Summer 2010, Summer 2011 •  Mayo Clinic Meritorious Award –  Healthcare trend surveillance using social networks and health search queries (funded 2013) –  What makes a health-related tweet informative (funded 2014) Research Grants and Proposals 81
  • 82. 82   Conclusion Search Intent Mining Health Search Intent Mining
  • 83. 83   Conclusion Health Search Intent Mining Identified consumer- oriented intent classes Multi-label Classification Problem (L=14) Supervised ML Knowledge-driven Approach
  • 84. Semantics-based Intent Classification -  Based on UMLS semantic types and concepts -  Advanced text analytics -  Consumer Health Vocabulary Consumer Health Vocabulary Generation -  Leveraged Knowledge from Wikipedia -  Maps CHV terms to medical terms 84   Conclusion Knowledge Driven Approach for Health Search Intent Mining Concept Identification -  UMLS MetaMap -  Advanced text analytics -  Consumer Health Vocabulary Personalized eHealth Interventions
  • 85. 85   Conclusion Information overload on Twitter Subjectivity Adapted search intent mining algorithm to enable efficient browsing of the health information on Social Health Signals   Objectively what makes a tweet informative?
  • 86. Publications •  Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal A Jadhav et al. AMIA Annual Symposium 2014 •  Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal A Jadhav et al. Journal of Medical Internet Research JMIR (Impact factor 4.7) •  Evaluating the Process of Online Health Information Searching: A Qualitative Approach to Exploring Consumer Perspectives A Fiksdal, A Kumbamu, A Jadhav et al. Journal of Medical Internet Research JMIR (Impact factor 4.7) •  Online Information Seeking for Cardiovascular Diseases: A Case Study from Mayo Clinic A Jadhav et al. 25th European Medical Informatics Conference (MIE 2014) •  Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, Pitfalls, and Use Cases M Panahiazar, V Taslimi, A Jadhav et al. IEEE International Conference on Big Data (IEEE BigData 2014) •  Comparative Analysis of Online Health Information Search by Device Type A Jadhav et al. AMIA TBI/CRI 2014 •  An Analysis of Mayo Clinic Search Query Logs for Cardiovascular Diseases A Jadhav et al. AMIA Annual Symposium 2014 •  What Information about Cardiovascular Diseases do People Search Online? A Jadhav et al. 25th European Medical Informatics Conference (MIE 2014) 86  
  • 87. Publications 87   •  Twitris- a System for Collective Social Intelligence A Sheth, A Jadhav et al., Springer, Encyclopedia of Social Network Analysis and Mining (ESNAM), 2014 •  Twitris: Socially Influenced Browsing A Jadhav et al. Semantic Web Challenge, International Semantic Web Conference ISWC 2009 •  Twitris 2.0: Semantically Empowered System for Understanding Perceptions From Social Data A Jadhav et al. Semantic Web Challenge, International Semantic Web Conference ISWC 2010 •  Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences M Nagarajan, K Gomadam, A Sheth, A Ranabahu, R Mutharaju A Jadhav Web Information Systems Engineering (WISE 2009) •  Understanding Events Through Analysis Of Social Media A Sheth, H Purohit, A Jadhav, et al., Technical Report, Kno.e.sis Center, 2010 •  Twitris+: Social Media Analytics Platform for Effective Coordination A. Smith, A. Sheth, A. Jadhav, et al. NSF SoCS Symposium, 2012 •  Patent on Context-Aware Information Recommendation, filed in January 2013 –  Patent filled based on HP summer 2011 internship work –  Ashutosh Jadhav, Hamid Motahari, Susan Spence, Claudio Bartolini
  • 88. •  Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Transactions on Information Systems (TOIS) 24, 3,320-352. •  Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 131-138. •  Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. 2010. Clustering query refinements by user intent. In Proceedings of the 19th international conference on World wide web. ACM, 841-850. •  Radlinski, F., Szummer, M., and Craswell, N. 2010. Inferring query intent from reformulations and clicks. In Proceedings of the 19th international conference on World wide web. ACM, 1171-1172. •  Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web. ACM, 13-19. •  Nanda, A., Omanwar, R., and Deshpande, B. 2014. Implicitly learning a user interest profile for personalization of web search using collaborative filtering. In Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE •  Soni, S. 2015. Domain specic document retrieval framework on near real-time social health data. Thesis, Wright State University •  Naaman, M., Boase, J., and Lai, C.-H. 2010. Is it really about me?: message content in social awareness streams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work. ACM, 189-192. •  White, R. W. and Horvitz, E. 2014. From health search to healthcare: explorations of intention and utilization via query logs and user surveys. JAMIA •  Celikyilmaz, A., Hakkani-T ur, D., and T ur, G. 2011. Leveraging web query logs to learn user intent via bayesian discrete latent variable model. In Proceedings of ICML. •  Amit Sheth 15 years of Semantic Search and Ontology-enabled Semantic Applications 88   References
  • 89. •  Sheth A, Avant D, Bertram C, inventors; Taalee, Inc., assignee. System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising. United States patent US 6,311,194. 2001 Oct 30. •  Lu, C.-J. 2012. Accidental discovery of information on the user-defined social web: A mixed- method study. Ph.D. thesis, University of Pittsburgh. •  Li, X. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1337-1345. •  Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., and Zeng- Treitler, Q. 2008. Consumer health concepts that do not map to the umls: where do they fit? Journal of the American Medical Informatics Association 15, 4, 496-505. •  Hu, J., Wang, G., Lochovsky, F., Sun, J.-t., and Chen, Z. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th international conference on World wide web. ACM, •  Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., and Zheng, Q. 2012. Mining query subtopics from search log data. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 305-314 •  Fox, S. 2014. Pew internet & american life project report. 2013. Pew Internet: Health URL: http:// www. pewinternet. org/fact-sheets/health-fact-sheet/ •  Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Broder, A. 2002. A taxonomy of web search. In ACM Sigir forum. Vol. 36. ACM, 3-10. •  Baeza-Yates, R., Calderon-Benavides, L., and Gonzalez-Caro, C. 2006. The intention behind web queries. In String processing and information retrieval. Springer, 98-109. 89   References
  • 93.   Thank you J Disclaimer: All other trademarks, logos and images used in this presentation belong to their respective owners.