SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Analyzing social media to characterize HIV at-risk populations among MSM in San Diego
Narendran Thangarajan1, Dr. Nella Green3, Dr. Amarnath Gupta2, Dr. Susan Little3, Dr. Nadir Weibel1
Digital
Epidemiology
This research is funded by Frontier of Innovative Scholars Program, UCSD
and Center for AIDS Research, UCSD
1 Department of CSE, UC San Diego,	
  2 San Diego Supercomputer Center, 3 School of Medicine, UC San Diego
naren@ucsd.edu
35 MILLION people with AIDS worldwide.
1.2 MILLION people with AIDS in US.
660,000 total deaths caused by AIDS in US.
78% of the new infections in 2010 were MSM.
California (along with Florida) had the
highest number of HIV diagnoses in 2013.
Interesting recent trend - Proliferation of social networks and
real-time communication capabilities.
FISP CFAR
+ =
“Just treated a HIV infected person from location X.We should
probably conduct a PrEP intervention at X.”
“We should deploy peer education in locationY, most of our
patients are from there.”
Ineffective prevention strategies: 50,000 new HIV infections each year.Problem
Characterize and identify HIV at-risk MSM populations by studying
user sentiments and behaviors on social networks.
2015
2012
Salathé et. al. published “Digital Epidemiology” in PLoS
Computational Biology Journal
Solution
2014
Ginsberg et. al. published “Detecting influenza epidemics using
search engine query data” in Nature journal.
2008
Methods of using real-time social media technologies for
detection and remote monitoring of HIV outcomes - Sean D.
Young et. al., Elsevier Preventive Medicine, 2014.
Unraveling Abstinence and Relapse: Smoking Cessation
Reflected in Social Media - Dr. Elizabeth Murnane, CHI 2014.
1. Data collection, classification and refinementMethod
• Tweets are collected in real-time
through theTwitter Streaming API.
Twitter’s “filter hose” is used to collect
tweets from San Diego county.
• Each tweet is cleaned by removing
stop words, punctuations and
converting to lower case.
III. Migration from raw twitter data to social network graph
II. Improving the accuracy of HIV risk tweets classification using machine learning
To improve the accuracy of HIV
risk tweets classified, we
evaluated two linear classifiers -
SupportVector Machines (SVM)
and Logistic Regression with
different sets of features.
Feature Set SVM Logistic Regression
Bag of Words 15.73% 15.72%
Stop Word Removal 12.9% 12.98%
Domain Specific Terms 11.37% 7.42%
Tweeter information 17.12% 15.23%
Error rates using different linear classifiers
• The property graph model was
adopted as the data model for HIV at-
risk MSM twitter social network.
• 7 node types and 9 edge types were
identified as shown.
• Ontologies (shown in green) are used
to infer indirect relationships between
entities. For instance, it allows us to
query for users who post tweets
related to meth and sex venues.
• The resulting graph was materialized
in a graph database called Neo4J.
Results obtained using EDA queriesAnalysis
Exploratory Data Analysis queries helped understand the hidden patterns in
the HIV at-risk social network.
Querying the social graph to identify interesting communication structuresResults
Currently, we have a query-able HIV at-risk twitter network graph.
Proximity: How close are drug bucket
users to other homosexual bucket users in
terms of hop count?
Topics of interest: What are the main topics
in the discussions among people who are at
a one-hop following distance from their sub-
graph’s hubs?
Conversations: How many conversations
are happening among drug bucket users
alone , sex bucket users alone and across
drug bucket users and sex bucket users?”
Preferences: Identify two drug bucket users
who are most consulted by homosexual
people.
Current status and future worksFuture
(0) Drug (1) Homosexual (2) STI
(3) Sex (4) SexVenues
The HIV at-risk MSM social network
coupled with the real-world
HIV transmission network inferred using
phylodynamics from SD PIC will help us
understand if the actual sexual network can
be reconstructed using the social network.
Ultimately, this social network could predict
an individual’s future HIV transmission risk
enabling us to prevent it in real-time.
• Each tweet is classified as a HIV risk tweet if it falls in one
of the five HIV risk categories - Drug, SexVenues, Sex,
Homosexual, SexuallyTransmitted Infections.
• Classified tweets are refined further using exclusion and
inclusion lists of co-occurring words. e.g.“ice cold” doesn't
refer to meth (a drug commonly called “ice")
• After getting a refined set of HIV risk tweets, the relevant metadata (like tweeters
and the mentioned users) were fetched usingTwitter’s public APIs.
• Retweet and reply chains were pulled in recursively to ensure the original tweet
and the corresponding tweeter were part of the resulting social network graph.
Most active time of the day Most active day of the week Power-law distribution of tweets
Length of HIV risk tweets Tweets distribution across risk buckets Most co-occurring risk categories
• IRB approval and recruitment - Currently, we are collecting
twitter handles of people in the HIV transmission network and
those at risk of acquiring HIV. This enables us to compare the
structural similarities in the sexual network and the twitter
social network.
• Interactive data visualizations to enable visualizing the evolving
HIV at-risk social network to decipher underlying patterns in
network structure evolution and the corresponding changes in
SNA metrics.
• Computational model that captures the behavior of a HIV at-
risk user onTwitter.
Social	
  Network
Sexual	
  Network
• Collaboration with Harvard to identify change-points in the social
network structure.

Weitere ähnliche Inhalte

Was ist angesagt?

CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
Steven Wardell
 
Identifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis EventsIdentifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis Events
IIIT Hyderabad
 
Spammer taxonomy using scientific approach
Spammer taxonomy using scientific approachSpammer taxonomy using scientific approach
Spammer taxonomy using scientific approach
Kamoru Abiodun Balogun(Bsc,MIT,CCNA,OCA,PhD inview UPM)
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
Ashish Arora
 

Was ist angesagt? (20)

Visualizing Communication on Social Media: Making Big Data Acessible
Visualizing Communication on Social Media: Making Big Data AcessibleVisualizing Communication on Social Media: Making Big Data Acessible
Visualizing Communication on Social Media: Making Big Data Acessible
 
Seminar Report Mine
Seminar Report MineSeminar Report Mine
Seminar Report Mine
 
presentation29
presentation29presentation29
presentation29
 
Usability Review of Mashup Tools
Usability Review of Mashup ToolsUsability Review of Mashup Tools
Usability Review of Mashup Tools
 
Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16
Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16
Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16
 
Malware propagation in large scale networks
Malware propagation in large scale networksMalware propagation in large scale networks
Malware propagation in large scale networks
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
 
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
 
友人関係と感染症伝搬をネットワークで理解する
友人関係と感染症伝搬をネットワークで理解する友人関係と感染症伝搬をネットワークで理解する
友人関係と感染症伝搬をネットワークで理解する
 
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
 
Nanotweets
NanotweetsNanotweets
Nanotweets
 
presentation
presentationpresentation
presentation
 
Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...
Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...
Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...
 
Who’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social MediaWho’s in the Gang? Revealing Coordinating Communities in Social Media
Who’s in the Gang? Revealing Coordinating Communities in Social Media
 
Identifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis EventsIdentifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis Events
 
Spammer taxonomy using scientific approach
Spammer taxonomy using scientific approachSpammer taxonomy using scientific approach
Spammer taxonomy using scientific approach
 
Detection and resolution of rumours in social media
Detection and resolution of rumours in social mediaDetection and resolution of rumours in social media
Detection and resolution of rumours in social media
 
00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing
 
Towards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and AntisemitismTowards a More Holistic Approach on Online Abuse and Antisemitism
Towards a More Holistic Approach on Online Abuse and Antisemitism
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 

Andere mochten auch

El Ebusiness y sus componentes para una gestión exitosa
El Ebusiness y sus componentes para una gestión exitosaEl Ebusiness y sus componentes para una gestión exitosa
El Ebusiness y sus componentes para una gestión exitosa
erikaaSR
 

Andere mochten auch (18)

Hindu palmistry symbols and signs .
Hindu palmistry symbols and signs .Hindu palmistry symbols and signs .
Hindu palmistry symbols and signs .
 
Tourem ver2.2 20140816
Tourem ver2.2 20140816Tourem ver2.2 20140816
Tourem ver2.2 20140816
 
Power under Pressure
Power under PressurePower under Pressure
Power under Pressure
 
Website nightmares | Brenda Cordova | Web Design
Website nightmares | Brenda Cordova | Web DesignWebsite nightmares | Brenda Cordova | Web Design
Website nightmares | Brenda Cordova | Web Design
 
Mindset or training
Mindset or trainingMindset or training
Mindset or training
 
Actividad 3.1
Actividad 3.1Actividad 3.1
Actividad 3.1
 
Estado del arte de las modalidades de ebusiness - perú y américa latina en el...
Estado del arte de las modalidades de ebusiness - perú y américa latina en el...Estado del arte de las modalidades de ebusiness - perú y américa latina en el...
Estado del arte de las modalidades de ebusiness - perú y américa latina en el...
 
Cordova Windows Installation
Cordova Windows InstallationCordova Windows Installation
Cordova Windows Installation
 
El Ebusiness y sus componentes para una gestión exitosa
El Ebusiness y sus componentes para una gestión exitosaEl Ebusiness y sus componentes para una gestión exitosa
El Ebusiness y sus componentes para una gestión exitosa
 
Herón de alejandría ensayo
Herón de alejandría ensayoHerón de alejandría ensayo
Herón de alejandría ensayo
 
Happy Birthday Singapore
Happy Birthday SingaporeHappy Birthday Singapore
Happy Birthday Singapore
 
Paises muestra
Paises muestraPaises muestra
Paises muestra
 
Portable Air Compressors
Portable Air CompressorsPortable Air Compressors
Portable Air Compressors
 
Resolucion admision demanda de alimentos y asignacion provicional
Resolucion admision demanda de alimentos y asignacion provicionalResolucion admision demanda de alimentos y asignacion provicional
Resolucion admision demanda de alimentos y asignacion provicional
 
Curso competitividad-laboral-competencia-trabajo yhon
Curso competitividad-laboral-competencia-trabajo yhonCurso competitividad-laboral-competencia-trabajo yhon
Curso competitividad-laboral-competencia-trabajo yhon
 
7 Letters_The Church of Ephesus
7 Letters_The Church of Ephesus7 Letters_The Church of Ephesus
7 Letters_The Church of Ephesus
 
Audiencia y sentencia juicio de alimentos
Audiencia y sentencia juicio de alimentosAudiencia y sentencia juicio de alimentos
Audiencia y sentencia juicio de alimentos
 
Bioteknologi KEL 5
Bioteknologi KEL 5Bioteknologi KEL 5
Bioteknologi KEL 5
 

Ähnlich wie Pirc net poster

Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
Fattori - 50 abstracts of e patient. In collaborazione con Monica DaghioFattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
Giuseppe Fattori
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Subhajit Sahu
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Subhajit Sahu
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Jake Chen
 
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGA MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
ijaia
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
gerogepatton
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
gerogepatton
 

Ähnlich wie Pirc net poster (20)

Modeling Spread of Disease from Social Interactions
Modeling Spread of Disease from Social InteractionsModeling Spread of Disease from Social Interactions
Modeling Spread of Disease from Social Interactions
 
Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
Fattori - 50 abstracts of e patient. In collaborazione con Monica DaghioFattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
 
How Should We Target Prevention Interventions?
How Should We Target Prevention Interventions?How Should We Target Prevention Interventions?
How Should We Target Prevention Interventions?
 
Ebola response in Liberia: A step towards real-time epidemic science
Ebola response in Liberia: A step towards real-time epidemic scienceEbola response in Liberia: A step towards real-time epidemic science
Ebola response in Liberia: A step towards real-time epidemic science
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
204
204204
204
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
 
1112 social media and public health
1112 social media and public health1112 social media and public health
1112 social media and public health
 
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGA MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
 
IRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source Identification
 
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyComprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
 
Social network analysis and audience segmentation, presented by Jason Baldridge
Social network analysis and audience segmentation, presented by Jason BaldridgeSocial network analysis and audience segmentation, presented by Jason Baldridge
Social network analysis and audience segmentation, presented by Jason Baldridge
 
Role of data science during covid times
Role of data science during covid timesRole of data science during covid times
Role of data science during covid times
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 

Mehr von UC San Diego

A primer on network devices
A primer on network devicesA primer on network devices
A primer on network devices
UC San Diego
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)
UC San Diego
 
Pervasive Web Application Architecture
Pervasive Web Application ArchitecturePervasive Web Application Architecture
Pervasive Web Application Architecture
UC San Diego
 

Mehr von UC San Diego (20)

A primer on network devices
A primer on network devicesA primer on network devices
A primer on network devices
 
Datacenter traffic demand characterization
Datacenter traffic demand characterizationDatacenter traffic demand characterization
Datacenter traffic demand characterization
 
Smart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-thingsSmart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-things
 
Social Networks analysis to characterize HIV at-risk populations - Progress a...
Social Networks analysis to characterize HIV at-risk populations - Progress a...Social Networks analysis to characterize HIV at-risk populations - Progress a...
Social Networks analysis to characterize HIV at-risk populations - Progress a...
 
eyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problemseyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problems
 
Ajaxism
AjaxismAjaxism
Ajaxism
 
Basic terminologies for a developer
Basic terminologies for a developerBasic terminologies for a developer
Basic terminologies for a developer
 
Fields in computer science
Fields in computer scienceFields in computer science
Fields in computer science
 
Understanding computer networks
Understanding computer networksUnderstanding computer networks
Understanding computer networks
 
FOSS Introduction
FOSS IntroductionFOSS Introduction
FOSS Introduction
 
Network Programming with Umit project
Network Programming with Umit projectNetwork Programming with Umit project
Network Programming with Umit project
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Airline reservation system db design
Airline reservation system db designAirline reservation system db design
Airline reservation system db design
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network Security
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)
 
Socket programming using java
Socket programming using javaSocket programming using java
Socket programming using java
 
Routing basics
Routing basicsRouting basics
Routing basics
 
Technology Quiz
Technology QuizTechnology Quiz
Technology Quiz
 
Android application development
Android application developmentAndroid application development
Android application development
 
Pervasive Web Application Architecture
Pervasive Web Application ArchitecturePervasive Web Application Architecture
Pervasive Web Application Architecture
 

Pirc net poster

  • 1. Analyzing social media to characterize HIV at-risk populations among MSM in San Diego Narendran Thangarajan1, Dr. Nella Green3, Dr. Amarnath Gupta2, Dr. Susan Little3, Dr. Nadir Weibel1 Digital Epidemiology This research is funded by Frontier of Innovative Scholars Program, UCSD and Center for AIDS Research, UCSD 1 Department of CSE, UC San Diego,  2 San Diego Supercomputer Center, 3 School of Medicine, UC San Diego naren@ucsd.edu 35 MILLION people with AIDS worldwide. 1.2 MILLION people with AIDS in US. 660,000 total deaths caused by AIDS in US. 78% of the new infections in 2010 were MSM. California (along with Florida) had the highest number of HIV diagnoses in 2013. Interesting recent trend - Proliferation of social networks and real-time communication capabilities. FISP CFAR + = “Just treated a HIV infected person from location X.We should probably conduct a PrEP intervention at X.” “We should deploy peer education in locationY, most of our patients are from there.” Ineffective prevention strategies: 50,000 new HIV infections each year.Problem Characterize and identify HIV at-risk MSM populations by studying user sentiments and behaviors on social networks. 2015 2012 Salathé et. al. published “Digital Epidemiology” in PLoS Computational Biology Journal Solution 2014 Ginsberg et. al. published “Detecting influenza epidemics using search engine query data” in Nature journal. 2008 Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes - Sean D. Young et. al., Elsevier Preventive Medicine, 2014. Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media - Dr. Elizabeth Murnane, CHI 2014. 1. Data collection, classification and refinementMethod • Tweets are collected in real-time through theTwitter Streaming API. Twitter’s “filter hose” is used to collect tweets from San Diego county. • Each tweet is cleaned by removing stop words, punctuations and converting to lower case. III. Migration from raw twitter data to social network graph II. Improving the accuracy of HIV risk tweets classification using machine learning To improve the accuracy of HIV risk tweets classified, we evaluated two linear classifiers - SupportVector Machines (SVM) and Logistic Regression with different sets of features. Feature Set SVM Logistic Regression Bag of Words 15.73% 15.72% Stop Word Removal 12.9% 12.98% Domain Specific Terms 11.37% 7.42% Tweeter information 17.12% 15.23% Error rates using different linear classifiers • The property graph model was adopted as the data model for HIV at- risk MSM twitter social network. • 7 node types and 9 edge types were identified as shown. • Ontologies (shown in green) are used to infer indirect relationships between entities. For instance, it allows us to query for users who post tweets related to meth and sex venues. • The resulting graph was materialized in a graph database called Neo4J. Results obtained using EDA queriesAnalysis Exploratory Data Analysis queries helped understand the hidden patterns in the HIV at-risk social network. Querying the social graph to identify interesting communication structuresResults Currently, we have a query-able HIV at-risk twitter network graph. Proximity: How close are drug bucket users to other homosexual bucket users in terms of hop count? Topics of interest: What are the main topics in the discussions among people who are at a one-hop following distance from their sub- graph’s hubs? Conversations: How many conversations are happening among drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” Preferences: Identify two drug bucket users who are most consulted by homosexual people. Current status and future worksFuture (0) Drug (1) Homosexual (2) STI (3) Sex (4) SexVenues The HIV at-risk MSM social network coupled with the real-world HIV transmission network inferred using phylodynamics from SD PIC will help us understand if the actual sexual network can be reconstructed using the social network. Ultimately, this social network could predict an individual’s future HIV transmission risk enabling us to prevent it in real-time. • Each tweet is classified as a HIV risk tweet if it falls in one of the five HIV risk categories - Drug, SexVenues, Sex, Homosexual, SexuallyTransmitted Infections. • Classified tweets are refined further using exclusion and inclusion lists of co-occurring words. e.g.“ice cold” doesn't refer to meth (a drug commonly called “ice") • After getting a refined set of HIV risk tweets, the relevant metadata (like tweeters and the mentioned users) were fetched usingTwitter’s public APIs. • Retweet and reply chains were pulled in recursively to ensure the original tweet and the corresponding tweeter were part of the resulting social network graph. Most active time of the day Most active day of the week Power-law distribution of tweets Length of HIV risk tweets Tweets distribution across risk buckets Most co-occurring risk categories • IRB approval and recruitment - Currently, we are collecting twitter handles of people in the HIV transmission network and those at risk of acquiring HIV. This enables us to compare the structural similarities in the sexual network and the twitter social network. • Interactive data visualizations to enable visualizing the evolving HIV at-risk social network to decipher underlying patterns in network structure evolution and the corresponding changes in SNA metrics. • Computational model that captures the behavior of a HIV at- risk user onTwitter. Social  Network Sexual  Network • Collaboration with Harvard to identify change-points in the social network structure.