Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
1. PRESENTED BY
MD. ZABIRUL ISLAM
1507110
MIN CHEN, (SENIOR MEMBER, IEEE)
PUBLISHED IN: IEEE JOURNAL & MAGAZINE ( VOLUME: 5 , PAGES;8869-8879, APRIL 2017)
Disease Prediction by Machine Learning Over
Big Data From Healthcare Communities
3. Motivation
In our modern society ,with the improvement of living standards, the incidence of chronic
disease is increasing.
According to a Chinese report in 2015, 86.6% of deaths are caused by chronic disease.
So with the development of big data analytics technology , accurate analysis of medical data
can give benefits early disease detection.
Department of Computer Science and Engineering 2
4. Introduction
Existing model uses structured data to predict the patients of either
high risk or low risk.
But for a complex disease, structured data is not a good way to
describe the disease.
We propose a new convolutional neural network (CNN)-based
multimodal disease risk prediction algorithm using structured and
unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral
infarction.
Department of Computer Science and Engineering 3
5. DATASET
Structured data (S-data):the patient's basic information such as the patient's age, gender
and life habits and laboratory data .
Unstructured Text data (T-data): the patient's narration of his/her illness, the doctor's
interrogation records and diagnosis.
We use (S-data,T-data,S&T data) to predict whether the patient is at high-risk of cerebral
infarction.
Department of Computer Science and Engineering 4
6. CNN-BASED UNIMODAL DISEASE RISK
PREDICTION (CNN-UDRP) ALGORITHM
• CNN-UDRP: Only uses on the text data to predict high risk of cerebral infarction.
• CNN-MDRP: Uses for structured and unstructured text data for prediction.
• For the processing of medical text data, we utilize CNN-based unimodal disease risk prediction (CNN-
UDRP) algorithm which can be divided into the following five steps.
Department of Computer Science and Engineering 5
7. CONT.
REPRESENTATION OF TEXT DATA:
Each word in the medical text is represented in the form of vector i.e.Word Embedding in NLP.
CONVOLUTION LAYER OF TEXT CNN:
Every time we choose s words of each word vector in the text as the representation row vector.
POOL LAYER OF TEXT CNN:
Select the max value of the n elements of each row to which play key role in the text.
FULL CONNECTION LAYER OF TEXT CNN
CNN CLASSIFIER:
Department of Computer Science and Engineering 6
8. EXPERIMENTAL RESULTS
For S-data, we use traditional machine learning algorithms i.e.to predict the risk of cerebral
infarction disease.
The highest accuracy of DT is 63% , recall of NB is 0.80 than other
For S-data, the NB classification is the best in experiment.
Department of Computer Science and Engineering 8
9. CONT..
The number of iterations increasing, the training error
rate of the CNN-UDRP (T-data) decreases test
accuracy increases
when the number of iterations are 70, the training
process of CNN-MDRP (S&T-data) algorithm is
already stable
We extract 10, 20,….,120 features from text by using CNN.
When the feature number of text is smaller than 30, the
accuracy and recall of CNN-UDRP (T-data) and CNN-MDRP
(S&T-data) algorithms are smaller.
Department of Computer Science and Engineering 7
10. CONT.
The accuracy of CNN-UDRP (T-data) is 0.9420 and the recall is 0.9808
The accuracy of CNN-MDRP (S&T-data) is 0.9480 and the recall is 0.99923
As seen for S-data the corresponding accuracy is low, which is roughly around 50%.
We find that by combining these two data, the accuracy rate can reach 94.80% to better evaluate the risk of cerebral infarction disease.
Department of Computer Science and Engineering 9
11. CONCLUSION
• For some simple disease, e.g., hyperlipidemia, only a few features of structured data can get a good
description of the disease,
• But for a complex disease, only using features of structured data is not a good way to describe the
disease.
• In this paper, we propose (CNN-MDRP) algorithm using structured and unstructured data from hospital
with 94.8% accuracy.
Department of Computer Science and Engineering 10
12. REFERENCES
P. Groves, B. Kayyali, D. Knott, and S. van Kuiken, The`Big Data'Revolution in Healthcare: Accelerating Value and Innovation. USA:
Center for US Health System Reform Business Technology Ofce, 2016.
M. Chen, S. Mao, and Y. Liu, ``Big data: A survey,'' Mobile Netw. Appl.,vol. 19, no. 2, pp. 171209, Apr. 2014.
P. B. Jensen, L. J. Jensen, and S. Brunak, ``Mining electronic health records: Towards better research applications and clinical care,''
NatureRev. Genet., vol. 13, no. 6, pp. 395405, 2012.
Department of Computer Science and Engineering 11