eDrugTrends: Social Media Analysis to Monitor Cannabis Trends

Social Media Analysis to Monitor
Cannabis Trends
Presenter: Raminta Daniulaityte, Ph.D.
CITAR & Kno.e.sis,
Wright State University
Boonshoft School of Medicine
T32 Substance Abuse Seminar
(Public Health Seminar at Columbia University)
February 23, 2017
© Wright State University
Center for Interventions,
Treatment, and Addiction
Research (CITAR)
Ohio Center of Excellence in
Knowledge-enabled
Computing (Kno.e.sis)

Research Team
NIH/NIDA R01 DA03945
Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use
Principle Investigators:
Raminta Daniulaityte, Ph.D. Amit Sheth, Ph.D.
Center for Interventions, Treatment, and
Addiction Research (CITAR),
Boonshoft School of Medicine
Ohio Center of Excellence in Knowledge-
Enabled Computing (Kno.e.sis),
Co-Investigators:
Robert Carlson, Ph.D. (CITAR) Silvia Martins, M.D., Ph.D. (Columbia U)
Ramzi Nahhas, Ph.D. (Comm. Health, WSU) Edward Boyer, M.D., Ph.D. (U Mass)
Krishnaprasad Thirunarayan, Ph.D. (Kno.e.sis)
Research Staff:
Francois R. Lamy, PhD (CITAR, Postdoc);
G. Alan Smith (Kno.e.sis, Software Engineer);
Sanjaya Wijeratne (Kno.e.sis, Ph.D. student)
Farahnaz Golroo (Kno.e.sis, Ph.D. student)
No Conflicts of Interest to declare

Project Aims
• Aim 1: Develop a comprehensive software platform, eDrugTrends, for
semi-automated processing and visualization of spatio-temporal, and social
network dimensions of social media data (Twitter and Web forums) on
cannabis and synthetic cannabinoid use.
• Aim 2: Deploy eDrugTrends to identify and compare trends in knowledge,
attitudes, and behaviors related to cannabis and synthetic cannabinoid use
across U.S. regions with different cannabis legalization policies using Twitter
and Web forum data.
• Types of data sources:
o Twitter (brief content, but over 500 million tweets/day, geo-info)
o Web forums such as Bluelight, drugs-forum, Reddit (detailed discussions of drug
use practices)
o Web survey on Bluelight

Presentation Objectives
• Overview of the technical capabilities of eDrugTrends platform
to process Twitter data
• How data is collected
• Geo-location identification
• Keyword selection and monitoring
• Tweet content processing
• Exploration of recently collected and processed data on
marijuana concentrates
• Integrating geographic and content analysis features to
explore cannabis-related tweeting activity

:
Twitter Data Collection
• Tweets are collected using Twitter’s streaming Application Programming
Interface (API) that provides free access to 1% of all tweets.
• Publically available tweets only.
• The system automatically filters out non-English language tweets.
• Current system started data collection March 2015; Close to 90 million
tweets have been collected
eDrugTrends Dashboard Showing in-coming Tweets and trending Topics

What does “up to 1%” mean?
• Free access to 1% of all tweets
o It can be thought of as a ”bucket” that can fit up to 1% of all tweets.
o Assuming 400 million daily tweets are generated per day, 1% would constitute
about 4 million daily tweets.
o Still, it is possible to miss some of the tweets due to sudden volume spikes.
• With a reasonably limited number of keywords, all or
most relevant tweets can be collected.
• Our system collects an average of about 150,000 tweets
per day, which is below the allowable limit.

Extraction of Geo-Location Information
• Tweets may contain GPS coordinates (via a mobile phone
that supports the feature).
• Users may indicate their geo information in their user profiles:
WHERE THE WEED AT
DAYTON, OH
SAN DIEGO
Pittsburgh, PA
wonderland
Earth
• eDrugTrends geo-locates close to 30% of tweets for state-
level and county-level information .
• Some earlier studies reported 1-3%of tweets with geo-location identification.

Adjusted Measures of Tweeting Activity
• To compare regional trends, we can’t work with raw numbers.
• eDrugTrends started running a parallel data collection system
to obtain general sample of tweets (denominator data).
• General sample data are collected using another API stream;
no keywords are used; data are processed to identify
geographic information.
• “General sample” is then used to calculate state-tweet-
volume-adjusted state proportion of tweets
o (or county-tweet-volume-adjusted county proportion of tweets)

RAW Numbers and ADJUSTED State Proportions of
Cannabis-Related Tweets
(March-September, 2016)
Raw numbers
Adjusted proportions

Twitter Data Collection: Keywords
• Keywords/slang terms are used to collect relevant tweets:
o Cannabis—weed, marijuana, spliff, ganja, kush, sativa, indica, chronic, blunt,
hydro, skunk, reefer, joint, etc.
o Marijuana concentrates—dabs, shatter, budder, wax BHO, butane honey oil,
hash oil, etc.
o Edibles—weed cookies, space cake, pot cookie, pot brownie, mj brownie,
medibles, etc.
o Synthetic cannabinoids—spice, K2, CHMINACA, AB-FUBINACA, synthetic weed,
smoking blend, noid, black mamba, etc.
• Inclusion of slang terms improves sensitivity (recall) in data
collection

Keyword challenges
• Issues with “precision”– risk of getting “noisy” or “irrelevant” data.
• Ways to improve precision of collected data:
o Ambiguous terms are combined with additional keywords indicating usage (e.g., smoke
blunt, smoke budder)
o “Black list” words are used to exclude irrelevant tweets (e.g., pumpkin spice latte, Emily
Blunt).
o Machine learning and other advanced information processing techniques are needed
• On-going monitoring is needed:
o New types of products or slang terms emerge. For example, “rosin”—new type of marijuana
concentrate produced using solvent-less method.
o New uses/meanings of words may affect the accuracy of collected data. (e.g., “dabs”)

Data Processing: Automated Tweets Classification
• Using manually annotated training
data sets, machine learning
classifiers were developed to
automatically classify tweets
• Classification by the the source/type
of communication (personal, media,
retail)
o Machine learning classifier (SVM)
achieved F score = 0.81.
• Classification by sentiment
(positive, negative, neutral),
o Sentiment classification is applied
to personal communications only
o Machine learning classifier (SVM)
achieved F score = 0.71.
Kickin back wit my spliff
Late night dabs
Medical marvel: the uses of cannabis
continue to grow
http://t.co/djtKPunxW9
$10 #Cannabis #Edibles 12 Varieties 1
Package 10MG #THC total
http://t.co/9w3xrFUnAe
Positive:
Marijuana works wonders on the soul
Strongest shatter I've ever smoked
Negative:
I’m not much of a fan when it comes to
edibles
hate when people think i smoke weed

Exploring Twitter Data on
Marijuana Concentrates

Initial report about marijuana concentrate
related tweeting: “Time for dabs”
2014 data
• Data collected over 2 month period, end of 2014.
• 27,018 tweets with identifiable state-level geo-location
• Although over 10 keywords were used (shatter, concentrates, butane
hash oil, etc.), keyword “dabs” produced over 99% of the total sample.
Dabs on Dabs on Dabs
Time for dabs
I just need a cute girl to take
dabs with me and get stoned
together
Time for dabs": Analyzing Twitter data on marijuana concentrates across the U.S.
Daniulaityte R., Nahhas R.W., Wijeratne S., Carlson R.G., Lamy F.R., Martins S.S., Boyer E.W., (...), Sheth A.
(2015) Drug and Alcohol Dependence, 155 , pp. 307-311.

2015: Increases in Marijuana Concentrate-Related
Tweeting Activity? Oops! Not So Fast…
0
2000
4000
6000
8000
10000
12000
14000
Jun
8th
Jun
15th
Jun
22nd
Jun
29th
Jul
6th
Jul
13th
Jul
20th
Jul
27th
Aug
3rd
Aug
10th
Aug
17th
Aug
24th
Aug
31st
Sep
7th
Sep
14th
Sep
21st
Sep
28th
Oct
5th
Oct
12th
Oct
19th
Oct
26th
Nov
2nd
Nov
9th
Nov
16th
Nov
23rd
Marijuana Concentrates US Tweeting Activity Jun-Nov, 2015
Tweets Unique users

Issues with Collected Data
Drug vs. Dance
Cam Newton cheers on Kevin Hart in a bench press challenge…then Dabs
Tell me why my mom DABS so well? https://t.co/7LZjdqBkQr
Cam celebrates, Cam dabs, Cam does Cam thing

Development of Machine Learning Classifier to
Extract Relevant Tweets
• Machine learning (ML) classifier was developed using 1,000 manually
labeled tweets
• Excellent results:
• ML classifier (NB) achieved F Measure=0.9; Kappa Statistic=0.8
• Dabs ML classifier was plugged into the system;

End of 2014
Start of 2017
• Similar geographic
patterns remained
• 96% were personal
communication tweets
(2017 data)
• Decrease in variability
across states:
Marijuana Concentrate Related Tweeting Over Time

Emerging Product: Rosin Tech
• Rosin technique is a solventless method to produce marijuana
concentrates
• Involves use of pressure and heat (e.g., hair straightener or rosin
tech press) to produce concentrates
• Occurrences of ‘rosin’ mentions in eDrugTrends steam (03 2015-09
2016), before “rosin” keyword was added

Rosin dabs: Preliminary data
• Keyword “Rosin” (exclude violin, brass, bow); Time period: December 6
2016- February 22 2017; 3,471 tweets collected (with identifiable state-level
geo-location)
YOOOO JUST PRESSED FOR THE FIRST TIME AND IT WAS LIFE
CHANGING 🙏🙏🙏🙏🔥😩 flower rosin is the new fav
The future is bright for #Rosin. #Marijuana #Cannabis
Nice chunk of rosin to start this morning off
2017 goal....buy a house & rosin press.
Marijuana rosin, and increasingly common extract:
https://t.co/tXZNErOPta
Rosin Tech Hash Is perfect for the people in non medical marijuana
states where it's hard to come across quality BHO to dab.

Adjusted Proportions of Rosin-Related Tweets
(Preliminary data, Dec. 6, 2016-Feb. 22, 2017)
84% - personal communication tweets
8% - media related
8% - retail related
Great Variability:
Mean: 1.96; Variance: 2.5

Exploring Cannabis-Related Tweeting
Activity: Combining Content and
Geographic Analysis Features

Cannabis Data, March–May, 2016
• Between March and May of 2016, the eDrugTrends
platform collected 13,233,837 cannabis-related tweets.
• About 30% (N=3,948,402) of those tweets had
identifiable state-level geo-location information.
• These U.S.-based tweets were posted by 965, 610
unique users.

Content Classification and Analysis
• Tweet content was automatically classified by:
A. source (personal communication, media, retail)
B. sentiment (positive, negative, neutral).
• States were grouped by cannabis legalization polices into
“recreational,” “medical, less restrictive,” “medical, more
restrictive,” and “illegal.”
• Permutation tests were performed to analyze differences
among four groups in:
A. Adjusted state proportions of all tweets,
B. personal communications only,
C. positive to negative sentiment ratios.

Classification of States by Legal Status

Adjusted state proportions of cannabis
related tweets
Adjusted tweet rate
per state
>3.0%
2.5%-3.0%
2.0%-2.49%
1.5%-1.9%
1.0%-1.49%
Medical Marijuana Legal
Recreational Marijuana Legal

Tweet Content Classification Results
Source/Type of communication
• 76.2% were personal communications,
• 21.1% media
• 2.7% retail-related
Sentiment
• About 71% of personal communication tweets expressed
positive sentiment towards cannabis,
• 16% negative sentiment,
• 13% were neutral.

Mapping Positive to Negative Sentiment Tweet Ratios

Conclusion
• Social media data present exciting new opportunities
for timely, sensitive and flexible approaches to
epidemiological surveillance of drug use practices
and trends.
• Continued research is needed to establish
methodological standards and practices to reduce
the “noise” and increase reliability and validity of
social media data.
• Social media monitoring can be of particular value
for tracking cannabis-related trends in the context of
rapid policy changes.

Keep up with our research/publications:
@ project page:
http://wiki.knoesis.org/index.php/EDrugTrends
or Google: eDrugTrends
or Twitter: @eDrugTrends
Thank you!
Center for Interventions,
Treatment, and Addiction
Research (CITAR)
https://medicine.wright.edu/citar
Ohio Center of Excellence in
Knowledge-enabled
Computing (Kno.e.sis)
http://knoesis.org
Sponsored by:
Grant No. 5R01DA039454-02
Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the investigator(s)
and do not necessarily reflect the views of the National Institutes of Health.

system architecture
eDrugTrends is an extension of TwitrisTM system developed at Kno.e.sis: http://twitris.knoesis.org
© Wright State University

eDrugTrends: Social Media Analysis to Monitor Cannabis Trends

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie eDrugTrends: Social Media Analysis to Monitor Cannabis Trends

Ähnlich wie eDrugTrends: Social Media Analysis to Monitor Cannabis Trends (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

eDrugTrends: Social Media Analysis to Monitor Cannabis Trends