Title:Leveraging Big Data and Social Sensors For Predicting Epidemic Disease Outbreak
Event & Venue:3rd Big Data Conclave at VIT Chennai on 20th & 21st April 2017
Project Conducted Under Guide:Dr.Sweetlin Hemalatha Professor at VIT Chennai
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
1. LEVERAGING BIG DATA AND SOCIAL SENSORS FOR PREDICTING
EPIDEMIC DISEASE OUTBREAK
ROHIT KRISHNA DESAI
INTRODUCTION
My research objective is to leverage social media,
Internet search, to Predict Epidemic outbreak by
incorporating Big Data and social sensors.
The research work ends it building predicting model
for epidemic diseases outbreak
The epidemic disease model helps in detecting the
prevalence of infectious diseases and reducing the
spread of diseases through early warnings and thereby
saving human lives.
OBJECTIVE
Dengue is serious disease caused by female mosquito.
People are of short live span due this disease, due to
false behaviour that nothing has happened to them by
avoiding symptoms of dengue
Our model is based on the tweets collected through
twitter of many users for “#dengue” or with addition of
“#dengue + death” keyword for getting filters tweets
We are going to open our account on twitter for
getting login account and password then we are going
to create our application for getting tweets from
twitter with provided secret credentials that are
consumer key, consumer secret key, access token,
access token secret which are unique for different
users with different applications too.
Language:-R
IDE:-RStudio GUI
Social Sensor: -Twitter
TECHNOLOGY AND TOOLS USED
WORK FLOW
1. Creating a Twitter Application
2.Loading the data
3. Extracting features from text data
3. Cleaned data as it had some missing values.
4.Working on RStudio- Building the corpus
5.Saving Tweets
6.Exploratory Data Analysis
7. Creation of Bag of Words model
8.Sentiment Function for positive and negative sentiment
9.Scoring tweets and adding column
10.Graph the tweets for particular location on map
11.Import the csv file
12.Visualizing the tweets
13.Text analysis
14.Word clouds
RESULTS
1)Tweets collected from Twitter
2)Training code after preprocessing
3)Map of dengue affected location in India
4)Histogram for retweet count
CONCLUSION AND RECOMMENDATION
Along with this semantic analysis on dataset is done cleaning with
removal of stoppage words and unwanted symbols. Text mining is carried
out for further text analytics with respective to that we can predicate
whatever it wants to compare with respective to the keywords.
Visualization of data with respective to the queries is carried out. These
make us to understand the concept very well
The performance of these model based on the positive tweets we gets
from twitter. If tweets seem to be neutral or negative then we can’t be
able to showcase the impact of these on particular area said about. The
positive tweets tell us about the particular incident for dengue has been
occurred or else giving alert messages for being stay away from the area or
else guidelines and preventive measures for people.
Proposed Methodology
•Data preprocessing and cleaning
•Logistic regression
•Text mining
•Sentiment analysis
•Visualization
•Analysis of data
•Generation of Training data
•Bags of words for positive and negative sentiment
•Architecture Model:
REFERENCES
1.Duc Nghia Pham, et.al. “A Literature Review of Methods for Dengue Outbreak Prediction “The Eighth
International Conference on Information, Process, and Knowledge Management, 2016
2.Jiajun Liu, et.al “Multi-scale Population and Mobility Estimation with Geo-tagged Tweets” Commonwealth
Scientific and Industrial Research Organisation (CSIRO), Australia,2015
3.Cheng Chen, et.al “Location-Aware Personalized News Recommendation with Deep Semantic Analysis” IEEE ,2017