Disease Prediction by Machine Learning Over Big Data From Healthcare Communities

PRESENTED BY
MD. ZABIRUL ISLAM
1507110
MIN CHEN, (SENIOR MEMBER, IEEE)
PUBLISHED IN: IEEE JOURNAL & MAGAZINE ( VOLUME: 5 , PAGES;8869-8879, APRIL 2017)
Disease Prediction by Machine Learning Over
Big Data From Healthcare Communities

OUTLINE
 Motivation
 Introduction
 Dataset
 Evaluation Methods
 Cnn-Based Unimodal Disease Risk
Prediction (CNN-UDRP) Algorithm
 Experimental Results
 Conclusion
Department of Computer Science and Engineering 1

Motivation
 In our modern society ,with the improvement of living standards, the incidence of chronic
disease is increasing.
 According to a Chinese report in 2015, 86.6% of deaths are caused by chronic disease.
 So with the development of big data analytics technology , accurate analysis of medical data
can give benefits early disease detection.

Introduction
 Existing model uses structured data to predict the patients of either
high risk or low risk.
 But for a complex disease, structured data is not a good way to
describe the disease.
 We propose a new convolutional neural network (CNN)-based
multimodal disease risk prediction algorithm using structured and
unstructured data from hospital.
 In this paper, we mainly focus on the risk prediction of cerebral
infarction.

DATASET
 Structured data (S-data):the patient's basic information such as the patient's age, gender
and life habits and laboratory data .
 Unstructured Text data (T-data): the patient's narration of his/her illness, the doctor's
interrogation records and diagnosis.
 We use (S-data,T-data,S&T data) to predict whether the patient is at high-risk of cerebral
infarction.

CNN-BASED UNIMODAL DISEASE RISK
PREDICTION (CNN-UDRP) ALGORITHM
• CNN-UDRP: Only uses on the text data to predict high risk of cerebral infarction.
• CNN-MDRP: Uses for structured and unstructured text data for prediction.
• For the processing of medical text data, we utilize CNN-based unimodal disease risk prediction (CNN-
UDRP) algorithm which can be divided into the following five steps.

CONT.
 REPRESENTATION OF TEXT DATA:
Each word in the medical text is represented in the form of vector i.e.Word Embedding in NLP.
 CONVOLUTION LAYER OF TEXT CNN:
Every time we choose s words of each word vector in the text as the representation row vector.
 POOL LAYER OF TEXT CNN:
Select the max value of the n elements of each row to which play key role in the text.
 FULL CONNECTION LAYER OF TEXT CNN
 CNN CLASSIFIER:

EXPERIMENTAL RESULTS
 For S-data, we use traditional machine learning algorithms i.e.to predict the risk of cerebral
infarction disease.
 The highest accuracy of DT is 63% , recall of NB is 0.80 than other
 For S-data, the NB classification is the best in experiment.

CONT..
 The number of iterations increasing, the training error
rate of the CNN-UDRP (T-data) decreases test
accuracy increases
 when the number of iterations are 70, the training
process of CNN-MDRP (S&T-data) algorithm is
already stable
 We extract 10, 20,….,120 features from text by using CNN.
 When the feature number of text is smaller than 30, the
accuracy and recall of CNN-UDRP (T-data) and CNN-MDRP
(S&T-data) algorithms are smaller.

CONT.
 The accuracy of CNN-UDRP (T-data) is 0.9420 and the recall is 0.9808
 The accuracy of CNN-MDRP (S&T-data) is 0.9480 and the recall is 0.99923
 As seen for S-data the corresponding accuracy is low, which is roughly around 50%.
 We find that by combining these two data, the accuracy rate can reach 94.80% to better evaluate the risk of cerebral infarction disease.

CONCLUSION
• For some simple disease, e.g., hyperlipidemia, only a few features of structured data can get a good
description of the disease,
• But for a complex disease, only using features of structured data is not a good way to describe the
disease.
• In this paper, we propose (CNN-MDRP) algorithm using structured and unstructured data from hospital
with 94.8% accuracy.

REFERENCES
 P. Groves, B. Kayyali, D. Knott, and S. van Kuiken, The`Big Data'Revolution in Healthcare: Accelerating Value and Innovation. USA:
Center for US Health System Reform Business Technology Ofce, 2016.
 M. Chen, S. Mao, and Y. Liu, ``Big data: A survey,'' Mobile Netw. Appl.,vol. 19, no. 2, pp. 171209, Apr. 2014.
 P. B. Jensen, L. J. Jensen, and S. Brunak, ``Mining electronic health records: Towards better research applications and clinical care,''
NatureRev. Genet., vol. 13, no. 6, pp. 395405, 2012.

Disease Prediction by Machine Learning Over Big Data From Healthcare Communities

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Disease Prediction by Machine Learning Over Big Data From Healthcare Communities

Ähnlich wie Disease Prediction by Machine Learning Over Big Data From Healthcare Communities (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Disease Prediction by Machine Learning Over Big Data From Healthcare Communities