SlideShare ist ein Scribd-Unternehmen logo
1 von 7
INDUSTRY EXPERIENCE

REAL-TIME
ANALYTICS FOR
THE HEALTHCARE
INDUSTRY:
ARRHYTHMIA
DETECTION
Vijay Srinivas Agneeswaran, Joydeb Mukherjee,
Ashutosh Gupta, Pranay Tonpay,
Jayati Tiwari, and Nitin Agarwal
Impetus Infotech Private Limited, Bangalore,
Karnataka, India

Abstract
It is time for the healthcare industry to move from the era of ‘‘analyzing our health history’’ to the age of
‘‘managing the future of our health.’’ In this article, we illustrate the importance of real-time analytics across the
healthcare industry by providing a generic mechanism to reengineer traditional analytics expressed in the R
programming language into Storm-based real-time analytics code. This is a powerful abstraction, since most data
scientists use R to write the analytics and are not clear on how to make the data work in real-time and on highvelocity data. Our paper focuses on the applications necessary to a healthcare analytics scenario, speciïŹcally
focusing on the importance of electrocardiogram (ECG) monitoring. A physician can use our framework to
compare ECG reports by categorization and consequently detect Arrhythmia. The framework can read the ECG
signals and uses a machine learning-based categorizer that runs within a Storm environment to compare different
ECG signals. The paper also presents some performance studies of the framework to illustrate the throughput and
accuracy trade-off in real-time analytics.
Introduction
The healthcare industry is undergoing a major transformation. The old days of using paper records of patients’
data are gone with the digitization of healthcare information,
starting with the use of electronic health records (EHRs). The
use of EHRs is becoming widespread, partly dictated by ïŹnancial stimulus and partly by governmental regulations.
The healthcare industry is now turning to the use of data
analytics. The pace is likely to pick up with the advent of the
Affordable Care Act (ACA), or ‘‘Obamacare,’’ which promises
to transform the healthcare industry from fee-for-service to
fee-for-value. Moreover, due to the widening of the eligibility

requirements and affordability, more people will come into
the system for healthcare. This implies the need for big-data
analytics, especially for the mandated health exchanges.
The Affordable Care Act has also spurred many innovations
in healthcare—this is evident in the number of healthcare
startups funded recently, such as the following (this list is
only indicative, not intended to be complete, and is biased
toward health analytics):
1. Health catalyst, which provides analytics suite to analyze
EHRs.
2. xG health solutions, which provides analytics of population health as well as reporting and interpretation.

Editor’s Note: Impetus supports multiple venues for dialogue in big data, providing thought leadership and services to create new ways to analyze
data to gain key opportunities in business and industry across enterprises. The following is a description of one potential application of their
expertise in machine learning within the healthcare space.

176BD

BIG DATA

SEPTEMBER 2013  DOI: 10.1089/big.2013.0018
INDUSTRY EXPERIENCE
Agneeswaran et al.

3. Lumeris, which uses real-time analytics of healthcare data
to improve patient care, essentially focused on making
ACA work for all players including health systems, payers,
and providers.
4. Eviti, which provides physicians with actionable information using analytics for cancer related decision making.
5. Humedica, which uses data from multiple sources including EHRs, claims data, etc. to help healthcare providers analyze patient data as well as population data.
6. HealthTap, which provides a social platform for physicians and patients to share information as well as build a
peer reputation.

comparison between the two common devices, the loop event
monitoring and the mobile cardiac outpatient telemetry
system, and their effectiveness in detecting arrhythmias.

Machine Learning–Based ClassiïŹcation
of ECG Data

The classiïŹer we have developed works in two modes: the
training mode (or learning mode) and the operational mode
(or advisory mode). In the training mode, we extract features
(i.e., variables or transformed variables) in terms of which
A number of startup accelerators include Nanthealth, Rockarrhythmia types, including its absence, can be represented
health, Healthbox, and Blueprint Health Services, among others.
and we learn the parameters of the inference mechanism
about the occurrence or nonoccurThis article presents a different scerence of a type of Arrhythmia. In
nario requiring real-time analytics of
this mode, the results cannot advise
‘‘THE OLD DAYS OF USING
big data, and as an example, applies
the doctor, but rather, the input
cutting edge big data technologies to
about the label (i.e., type of arPAPER RECORDS OF
historical data. The electrocardiorhythmia or absence of it) correPATIENTS’ DATA ARE GONE
gram (ECG) signal provides critical
sponding to each record provided is
WITH THE DIGITIZATION OF
information about the heart activity
used for training (see Fig. 1).
HEALTHCARE INFORMATION,
of a patient. Continuous monitoring
of ECG is important when a patient
Once the training is complete, the
STARTING WITH THE USE
is ambulatory or at the bedside. It is
classiïŹer goes into operational
OF ELECTRONIC HEALTH
very important to treat arrhythmic
mode, meaning it begins advising
RECORDS’’
patients on time, as delays can lead
the doctor on new, unseen, but
to potentially fatal complications.1
similar cases to those seen during
training. The doctor arrives at an
Arrhythmia detection from ECG
inference about the presence or absence of arrhythmia taking
signals is a well-studied problem. For instance, Gao et al.1
the output of the classiïŹer into consideration. Also, if arsolve it by using an artiïŹcial neural network approach based
rhythmia is present, which type it is can be suggested by the
on a Bayesian framework. Rothman and colleagues2 make a

FIG. 1.

ML Based ClassiïŹcation of ECG Data: Training Mode.

MARY ANN LIEBERT, INC.  VOL. 1 NO. 3  SEPTEMBER 2013 BIG DATA

BD177
ARRHYTHMIA DETECTION: REAL TIME ANALYTICS
Agneeswaran et al.

classiïŹer. The various types of arrhythmia classes (labels) will
be listed in a subsequent section. This mode of operation is
depicted in Fig. 2.
The input to machine learning algorithm is a set of historic
patient records. Clinical measurements recorded in the past
from ECG signals, namely, QRS duration, RR, P-R, Q-T intervals constitute such records, along with information such
as gender, age, and weight. This data is padded with the
categorical label a cardiologist had assigned to each record,
such as ‘‘normal’’ or one of the 15 types of pathology categories. These make up a total of 279 features as enumerated
by Guvenir et al.3

Class names and description
Class distribution:
Database:
Arrhythmia
Class code: Class:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16

Normal
Ischemic changes
(Coronary Artery Disease)
Old Anterior Myocardial Infarction
Old Inferior Myocardial Infarction
Sinus tachycardy
Sinus bradycardy
Ventricular Premature
Contraction (PVC)
Supraventricular
Premature Contraction
Left bundle branch block
Right bundle branch block
1. degree AtrioVentricular
block
2. degree AV block
3. degree AV block
Left ventricule hypertrophy
Atrial Fibrillation or Flutter
Others

FIG. 2.

178BD

Number of
instances:
245
44
15
15
13
25
3
2
9
50

Description of dataset
We analyzed a dataset containing 452 records belonging to
patients coming from different age groups, weights, heights
and gender (see http://archive.ics.uci.edu/ml/datasets/
Arrhythmia for more information).
There are in all 280 variables, including various arrhythmia
class types as the 280th column, in the database downloaded
from the source.1 Values for this column can be 1 to 16,
representing one of the codes as enumerated above. There are
5 categorical variables and 274 numeric variables.
Five variables had missing values in their records as enumerated below. These variables occurred in columns 11 to 15
in the original dataset.
Vector angles in degrees on front plane of:
11 T
8 values missing
12 P
22 values missing
13 QRST
1 value missing
14 J
376 values missing
Number of heart beats per minute
15 Heart rate
1 value missing

Some of the variables had ‘‘0’’ throughout the column (i.e.,
across all records). Those variables are enumerated below
with their column number followed by the variable name
20
70
132

0
0
4
5
22

140
144

‘‘DI S-prime Wave’’; 68 ‘‘AVL S-prime Wave’’
‘‘AVL Existence of ragged R wave’’; 84
‘‘AVF Existence of ragged P wave’’
‘‘V4 Existence of ragged P wave’’; 133 ‘‘V4 Existence
of diphasic derivation of P wave’’
‘‘V5 S-prime Wave’’, 142 ‘‘V5 Existence
of ragged R wave’’
‘‘V5 Existence of ragged P wave’’; 146 ‘‘V5 Existence
of ragged T wave’’

ML Based ClassiïŹcation of ECG Data: Operational Mode.

BIG DATA

SEPTEMBER 2013
INDUSTRY EXPERIENCE
Agneeswaran et al.

152
158
205
275

‘‘V6 S-prime Wave’’ ; 157 ‘‘V6 Existence
of diphasic derivation of P wave’’
‘‘V6 Existence of ragged T wave’’;
165 ‘‘DI Amplitude S-prime Wave’’
‘‘AVL Amplitude S-prime Wave’’;
265 ‘‘V5 Amplitude S-prime Wave’’
‘‘V6 Amplitude R-prime Wave’’

The ïŹrst four columns in the original dataïŹle had non-ECG
variables as follows:
1
2
3
4

Age: Age in years , linear
Sex: Sex (0 = male; 1 = female) , nominal
Height: Height in centimeters , linear
Weight: Weight in kilograms , linear

ClassiïŹcation algorithm

We chose the random forest (RF) classiïŹer2 for several reasons:
it is fast (training time); its OOB-error (out-of-bag errors) is a
good estimate for generalization error; it can handle noisy data;
it can suggest ‘‘important variables,’’ using which, a parsimonious predictive model can be built; and it has an imputation
method associated with it which at times is better choice than
using any other external methods for imputation. Additionally,
two or more separately trained RFs can be combined without
incurring much computational expenditure, and it is an ensemble classiïŹer (i.e., a collection of classiïŹers), which predicts
by counting votes cast by each classiïŹer for a class on a query
record. Predictive performance of an ensemble classiïŹer is better
than any of its constituents. The constituent classiïŹers for RF are
classiïŹcation trees. The advantage of using such classiïŹers is that
individual classiïŹers may be barely accurate (slightly better than
random guessing) but combining trees may produce classiïŹers
with much higher accuracy. Also, a great deal of variance may be
present as we move from one tree to another, but the overall
classiïŹer’s variance is reduced because of averaging that takes
place in the course of ‘‘ensembling.’’
RF is trained by bagging (bootstrapped aggregation) of training data. Random samples ‘‘with replacement’’ are drawn from
the training data and classiïŹcation trees are built using them. If
large numbers of trees are constructed (1–1/e)  63% of the
original data are used therein, the remaining 36% are used for
testing the trees to calculate OOB-errors. It can be shown that
this error is a fair indication of generalization error for the RF
classiïŹer. Generalization error measures predictive performance of classiïŹers when tested with unseen data outside of
the training set but supposedly generated from the same distribution as that of the training data. These will be the kind of
data encountered by the classiïŹer in the operational mode.
The keys to the predictive performance of RF classiïŹer are the
strength of individual classiïŹers and the diversity (degree of
uncorrelatedness) of constituent classiïŹcation trees in the
forest in terms of raw margin functions.4

Imputation of missing values
In the exploratory data analysis (EDA) phase, it was found
that important variables such as ‘‘heart rate’’ as measured in
MARY ANN LIEBERT, INC.  VOL. 1 NO. 3  SEPTEMBER 2013 BIG DATA

number of heart beats per minute, had some missing values.
A couple of imputation algorithms were tried out,5,6 and
ïŹnally rfImpute from the randomForest package was chosen
to impute those missing values. Amelia7 was not considered
because it could produce imputed results only with a high
value of prior information [with ‘‘empri’’ parameter value as
high 0.9*nrow(data), when usually 0.01* nrow(data) is used].
The latter amounts to adding lot of artiïŹcial observations
with the same mean and variance of existing observations but
with 0 covariance.

Imbalance of data with respect to classes
The gross imbalance in the dataset (Table 1) poses problems for
selecting a subset of data to be used for training and testing. If
the training and testing sets are typically partitioned (70%–80%
for training and 30%–20% for testing), classiïŹcation performance will be misleading. There are several ways to partially
address this problem. Generating artiïŹcial data for the minor
classes (via SMOTE algorithms and associated packages)8 is one
method. Another means is to down-sample data from the major
class. We have chosen the latter path [i.e., subsampling the
major class (Normal class) in proportion to the minor class
(those classes that had at least 10% data)]. While subsampling
the major class, we made sure that its maximum number did not
exceed 100% that of the minor class. Furthermore, weights were
used for the training examples supplied to the RF classiïŹer.
Classes that had single-digit representation namely, Left ventricule hypertrophy (0.9%), Atrial Fibrillation or Flutter (1.1%),
Ventricular Premature Contraction (PVC) (0.7%), Supraventricular Premature Contraction (0.4%), and Left bundle branch
block (1.9%) were not addressed.

Variable selection for model building
Variable Selection plays a major role in the development of
predictive models. In this study, one of the reasons for selecting RF classiïŹer over other alternatives was that it has a
means of assessing the effectiveness of each variable occurring
in the model, using which we can build a parsimonious model
for the deployment. The criteria based on which RF ranks its
‘‘Important Variables’’ are ‘‘Mean Decrease Accuracy’’ and
‘‘Mean Decrease Gini.’’ We prefer the latter for selecting the
important variables, because in some instances in the literature, it has been reported that the other measure is not stable.9
All variables with a Mean Decrease Gini value greater than its
mean value will be retained in the model, in our case by setting
the criterion threshold to its mean value (see Fig. 2). The
complete variable list with descriptions is provided in the online reference (http://archive.ics.uci.edu/ml/machine-learningdatabases/arrhythmia/arrhythmia.names).

Experimental Results and Discussion
We performed experiments on the classiïŹer we developed to
assess its predictive performance. We enumerate the steps of
the algorithm for classiïŹcation using RF below:

BD179
ARRHYTHMIA DETECTION: REAL TIME ANALYTICS
Agneeswaran et al.

1. Read comma-separated values of Arrhythmia data from text ïŹle as table.
2. Identify and create a response variable showing which class datapoints belong to (280th column of original data read as
table).
3. Make sure data is complete:
 Identify the columns with missing values.
 Replace the missing values (occurring as ‘‘?’’) with NA (required for imputation).
4. Assign names of the Variables (for ease of identiïŹcation).
5. Get rid of variables with zero entries, age, sex, height, and weight and the one specifying Arrhythmia Type (i.e., non-ECG
values). (For imputation, we cannot afford to retain so many variables with so few records. One of the imputation methods
used, Amelia, does not permit it.)
6. Perform Imputation with rfImpute/Amelia.
7. Sample imputed data judiciously (as described previously) from respective classes up to the maximum number of records it
contains except for the Normal Class (code 01). For this class try out number of records 100, 90, 80, and 70.
 Toss a biased coin to generate indices between 1 and number of records (rows) in the ratio 70:30.
 Generate training and test set using above indices.
8. Call Random Forest with imputed data and number of tree = 500 and other parameters.
9. Call Predict function on the test set of data.
10. Identify the important variables according to the speciïŹed criterion (MeanDecreaseGini) at speciïŹed threshold value (Set
equal to the Mean of MeanDecreaseGini).
11. Call Random Forest with important variables and training set of data and number of tree = 500 and other parameters.
12. Call Predict function on the test set of data.
13. Go back to step 7 until the list (100, 90, 80, 70) is exhausted.
Table 1 shows the computation of precision and recall, which
can be deïŹned below as follows:

recall Œ

Precision: the number of correctly classiïŹed examples of a
particular class divided by the number of examples labeled by
the system as belonging to that particular class.10

precision Œ

jfcorrect À labelsg  fpredicted À labelsgj
jcorrect À labelsj

F-score: a combination of the above two measures in the
form of harmonic mean.
F-Score Œ

jfcorrect À labelsg  fpredicted À labelsgj
jpredicted À labelsj

2 · precision · recall
precision ĂŸ recall

As the system keeps operating in the ïŹeld, more records for
the various cases will be collected, together with the cardiologists’ decisions for the respective records. A new RF
classiïŹer may be trained with these data and ïŹnally it can be

Recall (sensitivity): the number of correctly classiïŹed examples of a particular class divided by the number of examples
of that particular class in the data.

Table 1: Precision/Recall Computation
Number of records Class 1 (precision, recall, Class
major class
f-score as deïŹned below).
2

90

80

70

180BD

96.43
58.69
72.97
78.26
52.94
63.15
89.29
78.12
83.33
89.47
65.38
75.55

Class
3

Class
4

66.67 100.0
50.0
71.40 75.0 100.0
68.97 85.71 66.7
72.73 75.0
33.33
53.33 50.0
50.00
61.54 60.0
39.98
71.43 66.67 50.0
55.55 100
50.0
62.53 80.0
50.0
75.00
0.00 83.33
70.59
0.00 83.33
72.73
0.00 83.33

Class
5

Class
6

Class
9

33.33
100.0
49.96
33.33
100.0
49.96
50.0
100.0
66.67
33.33
100.0
49.96

85.71
75.00
79.99
87.5
87.5
87.5
75.00
90.00
81.82
80.00
88.80
84.17

100.0
100.0
100.0
66.67
100.0
79.99
100.0
100.0
100.0
100
100
100

Class With all variables With important variables
10
100-OOB-error
100-OOB-error
62.50
66.66
64.53
80.00
54.54
64.86
55.56
83.33
66.71
86.67
68.42
76.47

67.29

70.10

71.01

72.46

64.50

66.50

61.66

64.25

BIG DATA

SEPTEMBER 2013
INDUSTRY EXPERIENCE
Agneeswaran et al.

combined with the one currently operating incrementally
using the combine() function of randomForest.

Implementation of R-based ClassiïŹer
for Real-Time Analysis
‘‘R’’ code can be executed from within a bash script, which
allows us to invoke it from within a Java program (or any
programming language or script for that matter). Storm is
an open-source real time computation framework, which
allows us to process streams of data in a parallel fashion
making it a very good choice for classiïŹcation of data on a
cluster of nodes. A Storm topology consumes streams of
data and processes those streams in arbitrarily complex
ways, repartitioning the streams between each stage of the
computation.
The model ïŹle created in the previous step is referenced in
another ‘‘R’’ script, which is used for real-time classiïŹcation.
Data to run classiïŹcation on enters the storm framework via a
Spout which then emits it to the bolts. Each bolt runs ‘‘R’’
script in parallel and emits results of the classiïŹcation (which
can get captured and used as needed) as shown in Fig. 3.
Note that for each result in Table 2, one node is a Nimbus node
and the remaining are supervisors. Each node is an 8—has 8
quad-core CPUs, 32 GB of RAM, and 32 GB of swap space.

Table 2. ECG ClassiïŹcation Performance Analysis
Time taken (in seconds)
Number
of predictions
(ECG
categorizations)
20K
40K
0.1 million

Sequential
processing
(no-Storm
used)

Storm
cluster with
2 nodes
(1 spout,
8 bolts)

Storm
cluster with
3 nodes
(1 spout,
16 bolts)

3,600
7,200
18,300

900
1,710
4,440

450
900
2,400

Note: We made use of only one Spout for this POC. Depending on the mechanisms of data entry into Storm
framework, it is possible to use multiple spouts, which would
enhance performance further.

Concluding Remarks
This article has presented a real-time machine-learning
platform for the healthcare domain that allows ECG signals to
be classiïŹed. It is an additional input for the physician, but a
crucial one that facilitates care-for-value. The implication is
that this work provides the basis for building a powerful
analytical framework that can work in real-time—this study
could prove extremely useful, not only for ECG classiïŹcation,
but also for enabling physicians to get incremental analytics
on various kinds of patient data increasingly available in
the EHRs. Our study also enables incremental healthcare,
where the focus can shift to analytics, and consequently, to
customized real-time healthcare. The upcoming health exchanges may also beneïŹt, as on-the-ïŹ‚y analytics on highvelocity data becomes essential for providers, physicians, and
patients equally.

Author Disclosure Statement
All authors are employed by Impetus.

References

FIG. 3.

Running R over Storm.

MARY ANN LIEBERT, INC.  VOL. 1 NO. 3  SEPTEMBER 2013 BIG DATA

1. Dayong Gao, Madden M, Chambers D, Lyons G.
Bayesian ANN classiïŹer for ECG arrhythmia diagnostic
system: A comparison study. Proceedings of 2005 IEEE
International Joint Conference on Neural Networks
(IJCNN ’05) 2005; 4:2383–2388.
2. Rothman SA, et al. The diagnosis of cardiac arrhythmias:
A Prospective multi-center randomized study comparing
mobile cardiac outpatient telemetry versus standard loop
event monitoring. J Cardiovasc Electrophysiol 2007; 8:1–7.
3. Guvenir HA, Acar S, Demiroz, G, Cekin A. A supervised
machine learning algorithm for arrhythmia analysis.
Comput Cardiol 1997;7:433–436.

BD181
ARRHYTHMIA DETECTION: REAL TIME ANALYTICS
Agneeswaran et al.

4. Breiman L. Random Forests. Mach Learn 2001; 45:5–32.
5. Liaw A. Missing value imputations by randomForest. R
documentation. Available online at http://rss.acs.unt
.edu/Rdoc/library/randomForest/html/rfImpute.html.
(Last accessed on September 6, 2013).
6. Ishioka T. Imputation of missing values for unsupervised
data using the proximity in random forests. In: Proceedings of The Fifth International Conference on Mobile, Hybrid, and On-line Learning. Nice, France,
February 24–March 1, 2013.
7. Honaker J, King G, Blackwell M. AMELIA II: A program
for missing data. J Stat Softw 2011; 45:1–47.
8. Blagus R, Lusa L.SMOTE for high-dimensional classimbalanced data. BMC Bioinformatics 2013; 14:106.
Available online at www.biomedcentral.com/1471-2105/
14/106. (Last accessed on September 6, 2013).

182BD

9. Calle ML, Urrea V. Letter to the editor: Stability of
random forest importance measures. BrieïŹngs Bioinf
2011; 1286–89.
10. Solokova M, Guy L. A systematic analysis of performance
measures for classiïŹcation tasks. Inf Process Manag 2009;
45:427–437.
Address correspondence to:
Vijay Srinivas Agneeswaran, PhD
Innovation Labs
Impetus Infotech India Private Limited
Pritech Park SEZ, Bellandur Outer Ring Road
Bangalore, Karnataka 560103
India
E-mail: vijay.sa@impetus.co.in

BIG DATA

SEPTEMBER 2013

Weitere Àhnliche Inhalte

Was ist angesagt?

Health Monitoring KIOSK: An effective system for rural health management
Health Monitoring KIOSK: An effective system for rural health managementHealth Monitoring KIOSK: An effective system for rural health management
Health Monitoring KIOSK: An effective system for rural health managementijiert bestjournal
 
Hrs2015 ecg monitoring_de_albert
Hrs2015 ecg monitoring_de_albertHrs2015 ecg monitoring_de_albert
Hrs2015 ecg monitoring_de_albertDavid Albert
 
History of ICDs (Internal Cardiac Defibrillators)
History of ICDs (Internal Cardiac Defibrillators)History of ICDs (Internal Cardiac Defibrillators)
History of ICDs (Internal Cardiac Defibrillators)Jose Osorio
 
final report_weisi
final report_weisifinal report_weisi
final report_weisiWeisi Chen
 
Post-EuroPCR 2016 Coverage - by Meddevicetracker
Post-EuroPCR 2016 Coverage - by MeddevicetrackerPost-EuroPCR 2016 Coverage - by Meddevicetracker
Post-EuroPCR 2016 Coverage - by MeddevicetrackerPharma Intelligence
 
Evaluation of patient electrocardiogram datasets using signal quality indexing
Evaluation of patient electrocardiogram datasets using signal quality indexingEvaluation of patient electrocardiogram datasets using signal quality indexing
Evaluation of patient electrocardiogram datasets using signal quality indexingjournalBEEI
 
IRJET- Classification and Identification of Arrhythmia using Machine Lear...
IRJET-  	  Classification and Identification of Arrhythmia using Machine Lear...IRJET-  	  Classification and Identification of Arrhythmia using Machine Lear...
IRJET- Classification and Identification of Arrhythmia using Machine Lear...IRJET Journal
 
Rigel vital-signs-booklet-uk
Rigel vital-signs-booklet-ukRigel vital-signs-booklet-uk
Rigel vital-signs-booklet-ukCao XuĂąn TrĂŹnh
 
An Integrated Approach of Blood Pressure and Heart Rate Measurement Systems
An Integrated Approach of Blood Pressure and Heart Rate Measurement SystemsAn Integrated Approach of Blood Pressure and Heart Rate Measurement Systems
An Integrated Approach of Blood Pressure and Heart Rate Measurement SystemsIRJET Journal
 
Ijigsp v10-n5-3
Ijigsp v10-n5-3Ijigsp v10-n5-3
Ijigsp v10-n5-3ArhamSheikh1
 
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...ijsrd.com
 
Towards development of a low cost and
Towards development of a low cost andTowards development of a low cost and
Towards development of a low cost andArhamSheikh1
 
Medical device research and medical device expert witness expertise (1)
Medical device research and medical device expert witness expertise (1)Medical device research and medical device expert witness expertise (1)
Medical device research and medical device expert witness expertise (1)George Yanulis, D.Eng.
 
Final Documentation - Full
Final Documentation - FullFinal Documentation - Full
Final Documentation - FullEmeel Gayed
 
Comparison of septal strain patterns in dyssynchronous heart failure between ...
Comparison of septal strain patterns in dyssynchronous heart failure between ...Comparison of septal strain patterns in dyssynchronous heart failure between ...
Comparison of septal strain patterns in dyssynchronous heart failure between ...samzak
 
Wearable Sensors for Cardiac Rehabilitation
Wearable Sensors for Cardiac RehabilitationWearable Sensors for Cardiac Rehabilitation
Wearable Sensors for Cardiac RehabilitationAshot Melik-Martirosian
 
683 690,tesma412,ijeast
683 690,tesma412,ijeast683 690,tesma412,ijeast
683 690,tesma412,ijeastArhamSheikh1
 
Multimedia assignment
Multimedia assignmentMultimedia assignment
Multimedia assignmentBree Payne
 

Was ist angesagt? (19)

Health Monitoring KIOSK: An effective system for rural health management
Health Monitoring KIOSK: An effective system for rural health managementHealth Monitoring KIOSK: An effective system for rural health management
Health Monitoring KIOSK: An effective system for rural health management
 
Ecg
EcgEcg
Ecg
 
Hrs2015 ecg monitoring_de_albert
Hrs2015 ecg monitoring_de_albertHrs2015 ecg monitoring_de_albert
Hrs2015 ecg monitoring_de_albert
 
History of ICDs (Internal Cardiac Defibrillators)
History of ICDs (Internal Cardiac Defibrillators)History of ICDs (Internal Cardiac Defibrillators)
History of ICDs (Internal Cardiac Defibrillators)
 
final report_weisi
final report_weisifinal report_weisi
final report_weisi
 
Post-EuroPCR 2016 Coverage - by Meddevicetracker
Post-EuroPCR 2016 Coverage - by MeddevicetrackerPost-EuroPCR 2016 Coverage - by Meddevicetracker
Post-EuroPCR 2016 Coverage - by Meddevicetracker
 
Evaluation of patient electrocardiogram datasets using signal quality indexing
Evaluation of patient electrocardiogram datasets using signal quality indexingEvaluation of patient electrocardiogram datasets using signal quality indexing
Evaluation of patient electrocardiogram datasets using signal quality indexing
 
IRJET- Classification and Identification of Arrhythmia using Machine Lear...
IRJET-  	  Classification and Identification of Arrhythmia using Machine Lear...IRJET-  	  Classification and Identification of Arrhythmia using Machine Lear...
IRJET- Classification and Identification of Arrhythmia using Machine Lear...
 
Rigel vital-signs-booklet-uk
Rigel vital-signs-booklet-ukRigel vital-signs-booklet-uk
Rigel vital-signs-booklet-uk
 
An Integrated Approach of Blood Pressure and Heart Rate Measurement Systems
An Integrated Approach of Blood Pressure and Heart Rate Measurement SystemsAn Integrated Approach of Blood Pressure and Heart Rate Measurement Systems
An Integrated Approach of Blood Pressure and Heart Rate Measurement Systems
 
Ijigsp v10-n5-3
Ijigsp v10-n5-3Ijigsp v10-n5-3
Ijigsp v10-n5-3
 
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...
Home Care Heart Diagnosis and Measurement of Biological Signals Using Intelli...
 
Towards development of a low cost and
Towards development of a low cost andTowards development of a low cost and
Towards development of a low cost and
 
Medical device research and medical device expert witness expertise (1)
Medical device research and medical device expert witness expertise (1)Medical device research and medical device expert witness expertise (1)
Medical device research and medical device expert witness expertise (1)
 
Final Documentation - Full
Final Documentation - FullFinal Documentation - Full
Final Documentation - Full
 
Comparison of septal strain patterns in dyssynchronous heart failure between ...
Comparison of septal strain patterns in dyssynchronous heart failure between ...Comparison of septal strain patterns in dyssynchronous heart failure between ...
Comparison of septal strain patterns in dyssynchronous heart failure between ...
 
Wearable Sensors for Cardiac Rehabilitation
Wearable Sensors for Cardiac RehabilitationWearable Sensors for Cardiac Rehabilitation
Wearable Sensors for Cardiac Rehabilitation
 
683 690,tesma412,ijeast
683 690,tesma412,ijeast683 690,tesma412,ijeast
683 690,tesma412,ijeast
 
Multimedia assignment
Multimedia assignmentMultimedia assignment
Multimedia assignment
 

Ähnlich wie Real-time Analytics for the Healthcare Industry: Arrythmia Detection- Impetus Article

Heart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesHeart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesIRJET Journal
 
Analysis of Heart Rate Variability Via Health Care Platform
Analysis of Heart Rate Variability Via Health Care PlatformAnalysis of Heart Rate Variability Via Health Care Platform
Analysis of Heart Rate Variability Via Health Care PlatformHealthcare and Medical Sciences
 
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...IRJET Journal
 
How Modern Cardiologists Are Overcoming HIT Challenges
How Modern Cardiologists Are Overcoming HIT ChallengesHow Modern Cardiologists Are Overcoming HIT Challenges
How Modern Cardiologists Are Overcoming HIT ChallengesObjective Medical Systems
 
Detection and Classification of ECG Arrhythmia using LSTM Autoencoder
Detection and Classification of ECG Arrhythmia using LSTM AutoencoderDetection and Classification of ECG Arrhythmia using LSTM Autoencoder
Detection and Classification of ECG Arrhythmia using LSTM AutoencoderIRJET Journal
 
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...IRJET Journal
 
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare Analysis
IRJET- A Prediction Engine for Influenza  Pandemic using Healthcare AnalysisIRJET- A Prediction Engine for Influenza  Pandemic using Healthcare Analysis
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare AnalysisIRJET Journal
 
Multiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning AlgorithmsMultiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning AlgorithmsIRJET Journal
 
ç‚șæ­é†«é™ą 20070913
ç‚șæ­é†«é™ą 20070913ç‚șæ­é†«é™ą 20070913
ç‚șæ­é†«é™ą 20070913calaf0618
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...ijdmtaiir
 
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...LeeSam111
 
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...IRJET Journal
 
Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.IRJET Journal
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewIRJET Journal
 
Zigbee based wearable remote healthcare monitoring system for elderly patients
Zigbee based wearable remote healthcare monitoring system for elderly patientsZigbee based wearable remote healthcare monitoring system for elderly patients
Zigbee based wearable remote healthcare monitoring system for elderly patientsijwmn
 
Mining Health Examination Records A Graph Based Approach
Mining Health Examination Records A Graph Based ApproachMining Health Examination Records A Graph Based Approach
Mining Health Examination Records A Graph Based Approachijtsrd
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regressionijtsrd
 
A New Real Time Clinical Decision Support System Using Machine Learning for C...
A New Real Time Clinical Decision Support System Using Machine Learning for C...A New Real Time Clinical Decision Support System Using Machine Learning for C...
A New Real Time Clinical Decision Support System Using Machine Learning for C...IRJET Journal
 
WBSN based safe lifestyle: a case study of heartrate monitoring system
WBSN based safe lifestyle: a case study of heartrate monitoring system WBSN based safe lifestyle: a case study of heartrate monitoring system
WBSN based safe lifestyle: a case study of heartrate monitoring system IJECEIAES
 
IRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
IRJET- Cardiovascular Disease Prediction using Machine Learning TechniquesIRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
IRJET- Cardiovascular Disease Prediction using Machine Learning TechniquesIRJET Journal
 

Ähnlich wie Real-time Analytics for the Healthcare Industry: Arrythmia Detection- Impetus Article (20)

Heart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesHeart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning Techniques
 
Analysis of Heart Rate Variability Via Health Care Platform
Analysis of Heart Rate Variability Via Health Care PlatformAnalysis of Heart Rate Variability Via Health Care Platform
Analysis of Heart Rate Variability Via Health Care Platform
 
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
 
How Modern Cardiologists Are Overcoming HIT Challenges
How Modern Cardiologists Are Overcoming HIT ChallengesHow Modern Cardiologists Are Overcoming HIT Challenges
How Modern Cardiologists Are Overcoming HIT Challenges
 
Detection and Classification of ECG Arrhythmia using LSTM Autoencoder
Detection and Classification of ECG Arrhythmia using LSTM AutoencoderDetection and Classification of ECG Arrhythmia using LSTM Autoencoder
Detection and Classification of ECG Arrhythmia using LSTM Autoencoder
 
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
 
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare Analysis
IRJET- A Prediction Engine for Influenza  Pandemic using Healthcare AnalysisIRJET- A Prediction Engine for Influenza  Pandemic using Healthcare Analysis
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare Analysis
 
Multiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning AlgorithmsMultiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning Algorithms
 
ç‚șæ­é†«é™ą 20070913
ç‚șæ­é†«é™ą 20070913ç‚șæ­é†«é™ą 20070913
ç‚șæ­é†«é™ą 20070913
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
 
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...
Patient Monitoring Equipment Market shares anticipated to reach $26.2 billion...
 
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...
IRJET - Cloud based Enhanced Cardiac Disease Prediction using NaĂŻve Bayesian ...
 
Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A Review
 
Zigbee based wearable remote healthcare monitoring system for elderly patients
Zigbee based wearable remote healthcare monitoring system for elderly patientsZigbee based wearable remote healthcare monitoring system for elderly patients
Zigbee based wearable remote healthcare monitoring system for elderly patients
 
Mining Health Examination Records A Graph Based Approach
Mining Health Examination Records A Graph Based ApproachMining Health Examination Records A Graph Based Approach
Mining Health Examination Records A Graph Based Approach
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regression
 
A New Real Time Clinical Decision Support System Using Machine Learning for C...
A New Real Time Clinical Decision Support System Using Machine Learning for C...A New Real Time Clinical Decision Support System Using Machine Learning for C...
A New Real Time Clinical Decision Support System Using Machine Learning for C...
 
WBSN based safe lifestyle: a case study of heartrate monitoring system
WBSN based safe lifestyle: a case study of heartrate monitoring system WBSN based safe lifestyle: a case study of heartrate monitoring system
WBSN based safe lifestyle: a case study of heartrate monitoring system
 
IRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
IRJET- Cardiovascular Disease Prediction using Machine Learning TechniquesIRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
IRJET- Cardiovascular Disease Prediction using Machine Learning Techniques
 

Mehr von Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

Mehr von Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

KĂŒrzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

KĂŒrzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Real-time Analytics for the Healthcare Industry: Arrythmia Detection- Impetus Article

  • 1. INDUSTRY EXPERIENCE REAL-TIME ANALYTICS FOR THE HEALTHCARE INDUSTRY: ARRHYTHMIA DETECTION Vijay Srinivas Agneeswaran, Joydeb Mukherjee, Ashutosh Gupta, Pranay Tonpay, Jayati Tiwari, and Nitin Agarwal Impetus Infotech Private Limited, Bangalore, Karnataka, India Abstract It is time for the healthcare industry to move from the era of ‘‘analyzing our health history’’ to the age of ‘‘managing the future of our health.’’ In this article, we illustrate the importance of real-time analytics across the healthcare industry by providing a generic mechanism to reengineer traditional analytics expressed in the R programming language into Storm-based real-time analytics code. This is a powerful abstraction, since most data scientists use R to write the analytics and are not clear on how to make the data work in real-time and on highvelocity data. Our paper focuses on the applications necessary to a healthcare analytics scenario, speciïŹcally focusing on the importance of electrocardiogram (ECG) monitoring. A physician can use our framework to compare ECG reports by categorization and consequently detect Arrhythmia. The framework can read the ECG signals and uses a machine learning-based categorizer that runs within a Storm environment to compare different ECG signals. The paper also presents some performance studies of the framework to illustrate the throughput and accuracy trade-off in real-time analytics. Introduction The healthcare industry is undergoing a major transformation. The old days of using paper records of patients’ data are gone with the digitization of healthcare information, starting with the use of electronic health records (EHRs). The use of EHRs is becoming widespread, partly dictated by ïŹnancial stimulus and partly by governmental regulations. The healthcare industry is now turning to the use of data analytics. The pace is likely to pick up with the advent of the Affordable Care Act (ACA), or ‘‘Obamacare,’’ which promises to transform the healthcare industry from fee-for-service to fee-for-value. Moreover, due to the widening of the eligibility requirements and affordability, more people will come into the system for healthcare. This implies the need for big-data analytics, especially for the mandated health exchanges. The Affordable Care Act has also spurred many innovations in healthcare—this is evident in the number of healthcare startups funded recently, such as the following (this list is only indicative, not intended to be complete, and is biased toward health analytics): 1. Health catalyst, which provides analytics suite to analyze EHRs. 2. xG health solutions, which provides analytics of population health as well as reporting and interpretation. Editor’s Note: Impetus supports multiple venues for dialogue in big data, providing thought leadership and services to create new ways to analyze data to gain key opportunities in business and industry across enterprises. The following is a description of one potential application of their expertise in machine learning within the healthcare space. 176BD BIG DATA SEPTEMBER 2013 DOI: 10.1089/big.2013.0018
  • 2. INDUSTRY EXPERIENCE Agneeswaran et al. 3. Lumeris, which uses real-time analytics of healthcare data to improve patient care, essentially focused on making ACA work for all players including health systems, payers, and providers. 4. Eviti, which provides physicians with actionable information using analytics for cancer related decision making. 5. Humedica, which uses data from multiple sources including EHRs, claims data, etc. to help healthcare providers analyze patient data as well as population data. 6. HealthTap, which provides a social platform for physicians and patients to share information as well as build a peer reputation. comparison between the two common devices, the loop event monitoring and the mobile cardiac outpatient telemetry system, and their effectiveness in detecting arrhythmias. Machine Learning–Based ClassiïŹcation of ECG Data The classiïŹer we have developed works in two modes: the training mode (or learning mode) and the operational mode (or advisory mode). In the training mode, we extract features (i.e., variables or transformed variables) in terms of which A number of startup accelerators include Nanthealth, Rockarrhythmia types, including its absence, can be represented health, Healthbox, and Blueprint Health Services, among others. and we learn the parameters of the inference mechanism about the occurrence or nonoccurThis article presents a different scerence of a type of Arrhythmia. In nario requiring real-time analytics of this mode, the results cannot advise ‘‘THE OLD DAYS OF USING big data, and as an example, applies the doctor, but rather, the input cutting edge big data technologies to about the label (i.e., type of arPAPER RECORDS OF historical data. The electrocardiorhythmia or absence of it) correPATIENTS’ DATA ARE GONE gram (ECG) signal provides critical sponding to each record provided is WITH THE DIGITIZATION OF information about the heart activity used for training (see Fig. 1). HEALTHCARE INFORMATION, of a patient. Continuous monitoring of ECG is important when a patient Once the training is complete, the STARTING WITH THE USE is ambulatory or at the bedside. It is classiïŹer goes into operational OF ELECTRONIC HEALTH very important to treat arrhythmic mode, meaning it begins advising RECORDS’’ patients on time, as delays can lead the doctor on new, unseen, but to potentially fatal complications.1 similar cases to those seen during training. The doctor arrives at an Arrhythmia detection from ECG inference about the presence or absence of arrhythmia taking signals is a well-studied problem. For instance, Gao et al.1 the output of the classiïŹer into consideration. Also, if arsolve it by using an artiïŹcial neural network approach based rhythmia is present, which type it is can be suggested by the on a Bayesian framework. Rothman and colleagues2 make a FIG. 1. ML Based ClassiïŹcation of ECG Data: Training Mode. MARY ANN LIEBERT, INC. VOL. 1 NO. 3 SEPTEMBER 2013 BIG DATA BD177
  • 3. ARRHYTHMIA DETECTION: REAL TIME ANALYTICS Agneeswaran et al. classiïŹer. The various types of arrhythmia classes (labels) will be listed in a subsequent section. This mode of operation is depicted in Fig. 2. The input to machine learning algorithm is a set of historic patient records. Clinical measurements recorded in the past from ECG signals, namely, QRS duration, RR, P-R, Q-T intervals constitute such records, along with information such as gender, age, and weight. This data is padded with the categorical label a cardiologist had assigned to each record, such as ‘‘normal’’ or one of the 15 types of pathology categories. These make up a total of 279 features as enumerated by Guvenir et al.3 Class names and description Class distribution: Database: Arrhythmia Class code: Class: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 Normal Ischemic changes (Coronary Artery Disease) Old Anterior Myocardial Infarction Old Inferior Myocardial Infarction Sinus tachycardy Sinus bradycardy Ventricular Premature Contraction (PVC) Supraventricular Premature Contraction Left bundle branch block Right bundle branch block 1. degree AtrioVentricular block 2. degree AV block 3. degree AV block Left ventricule hypertrophy Atrial Fibrillation or Flutter Others FIG. 2. 178BD Number of instances: 245 44 15 15 13 25 3 2 9 50 Description of dataset We analyzed a dataset containing 452 records belonging to patients coming from different age groups, weights, heights and gender (see http://archive.ics.uci.edu/ml/datasets/ Arrhythmia for more information). There are in all 280 variables, including various arrhythmia class types as the 280th column, in the database downloaded from the source.1 Values for this column can be 1 to 16, representing one of the codes as enumerated above. There are 5 categorical variables and 274 numeric variables. Five variables had missing values in their records as enumerated below. These variables occurred in columns 11 to 15 in the original dataset. Vector angles in degrees on front plane of: 11 T 8 values missing 12 P 22 values missing 13 QRST 1 value missing 14 J 376 values missing Number of heart beats per minute 15 Heart rate 1 value missing Some of the variables had ‘‘0’’ throughout the column (i.e., across all records). Those variables are enumerated below with their column number followed by the variable name 20 70 132 0 0 4 5 22 140 144 ‘‘DI S-prime Wave’’; 68 ‘‘AVL S-prime Wave’’ ‘‘AVL Existence of ragged R wave’’; 84 ‘‘AVF Existence of ragged P wave’’ ‘‘V4 Existence of ragged P wave’’; 133 ‘‘V4 Existence of diphasic derivation of P wave’’ ‘‘V5 S-prime Wave’’, 142 ‘‘V5 Existence of ragged R wave’’ ‘‘V5 Existence of ragged P wave’’; 146 ‘‘V5 Existence of ragged T wave’’ ML Based ClassiïŹcation of ECG Data: Operational Mode. BIG DATA SEPTEMBER 2013
  • 4. INDUSTRY EXPERIENCE Agneeswaran et al. 152 158 205 275 ‘‘V6 S-prime Wave’’ ; 157 ‘‘V6 Existence of diphasic derivation of P wave’’ ‘‘V6 Existence of ragged T wave’’; 165 ‘‘DI Amplitude S-prime Wave’’ ‘‘AVL Amplitude S-prime Wave’’; 265 ‘‘V5 Amplitude S-prime Wave’’ ‘‘V6 Amplitude R-prime Wave’’ The ïŹrst four columns in the original dataïŹle had non-ECG variables as follows: 1 2 3 4 Age: Age in years , linear Sex: Sex (0 = male; 1 = female) , nominal Height: Height in centimeters , linear Weight: Weight in kilograms , linear ClassiïŹcation algorithm We chose the random forest (RF) classiïŹer2 for several reasons: it is fast (training time); its OOB-error (out-of-bag errors) is a good estimate for generalization error; it can handle noisy data; it can suggest ‘‘important variables,’’ using which, a parsimonious predictive model can be built; and it has an imputation method associated with it which at times is better choice than using any other external methods for imputation. Additionally, two or more separately trained RFs can be combined without incurring much computational expenditure, and it is an ensemble classiïŹer (i.e., a collection of classiïŹers), which predicts by counting votes cast by each classiïŹer for a class on a query record. Predictive performance of an ensemble classiïŹer is better than any of its constituents. The constituent classiïŹers for RF are classiïŹcation trees. The advantage of using such classiïŹers is that individual classiïŹers may be barely accurate (slightly better than random guessing) but combining trees may produce classiïŹers with much higher accuracy. Also, a great deal of variance may be present as we move from one tree to another, but the overall classiïŹer’s variance is reduced because of averaging that takes place in the course of ‘‘ensembling.’’ RF is trained by bagging (bootstrapped aggregation) of training data. Random samples ‘‘with replacement’’ are drawn from the training data and classiïŹcation trees are built using them. If large numbers of trees are constructed (1–1/e) 63% of the original data are used therein, the remaining 36% are used for testing the trees to calculate OOB-errors. It can be shown that this error is a fair indication of generalization error for the RF classiïŹer. Generalization error measures predictive performance of classiïŹers when tested with unseen data outside of the training set but supposedly generated from the same distribution as that of the training data. These will be the kind of data encountered by the classiïŹer in the operational mode. The keys to the predictive performance of RF classiïŹer are the strength of individual classiïŹers and the diversity (degree of uncorrelatedness) of constituent classiïŹcation trees in the forest in terms of raw margin functions.4 Imputation of missing values In the exploratory data analysis (EDA) phase, it was found that important variables such as ‘‘heart rate’’ as measured in MARY ANN LIEBERT, INC. VOL. 1 NO. 3 SEPTEMBER 2013 BIG DATA number of heart beats per minute, had some missing values. A couple of imputation algorithms were tried out,5,6 and ïŹnally rfImpute from the randomForest package was chosen to impute those missing values. Amelia7 was not considered because it could produce imputed results only with a high value of prior information [with ‘‘empri’’ parameter value as high 0.9*nrow(data), when usually 0.01* nrow(data) is used]. The latter amounts to adding lot of artiïŹcial observations with the same mean and variance of existing observations but with 0 covariance. Imbalance of data with respect to classes The gross imbalance in the dataset (Table 1) poses problems for selecting a subset of data to be used for training and testing. If the training and testing sets are typically partitioned (70%–80% for training and 30%–20% for testing), classiïŹcation performance will be misleading. There are several ways to partially address this problem. Generating artiïŹcial data for the minor classes (via SMOTE algorithms and associated packages)8 is one method. Another means is to down-sample data from the major class. We have chosen the latter path [i.e., subsampling the major class (Normal class) in proportion to the minor class (those classes that had at least 10% data)]. While subsampling the major class, we made sure that its maximum number did not exceed 100% that of the minor class. Furthermore, weights were used for the training examples supplied to the RF classiïŹer. Classes that had single-digit representation namely, Left ventricule hypertrophy (0.9%), Atrial Fibrillation or Flutter (1.1%), Ventricular Premature Contraction (PVC) (0.7%), Supraventricular Premature Contraction (0.4%), and Left bundle branch block (1.9%) were not addressed. Variable selection for model building Variable Selection plays a major role in the development of predictive models. In this study, one of the reasons for selecting RF classiïŹer over other alternatives was that it has a means of assessing the effectiveness of each variable occurring in the model, using which we can build a parsimonious model for the deployment. The criteria based on which RF ranks its ‘‘Important Variables’’ are ‘‘Mean Decrease Accuracy’’ and ‘‘Mean Decrease Gini.’’ We prefer the latter for selecting the important variables, because in some instances in the literature, it has been reported that the other measure is not stable.9 All variables with a Mean Decrease Gini value greater than its mean value will be retained in the model, in our case by setting the criterion threshold to its mean value (see Fig. 2). The complete variable list with descriptions is provided in the online reference (http://archive.ics.uci.edu/ml/machine-learningdatabases/arrhythmia/arrhythmia.names). Experimental Results and Discussion We performed experiments on the classiïŹer we developed to assess its predictive performance. We enumerate the steps of the algorithm for classiïŹcation using RF below: BD179
  • 5. ARRHYTHMIA DETECTION: REAL TIME ANALYTICS Agneeswaran et al. 1. Read comma-separated values of Arrhythmia data from text ïŹle as table. 2. Identify and create a response variable showing which class datapoints belong to (280th column of original data read as table). 3. Make sure data is complete: Identify the columns with missing values. Replace the missing values (occurring as ‘‘?’’) with NA (required for imputation). 4. Assign names of the Variables (for ease of identiïŹcation). 5. Get rid of variables with zero entries, age, sex, height, and weight and the one specifying Arrhythmia Type (i.e., non-ECG values). (For imputation, we cannot afford to retain so many variables with so few records. One of the imputation methods used, Amelia, does not permit it.) 6. Perform Imputation with rfImpute/Amelia. 7. Sample imputed data judiciously (as described previously) from respective classes up to the maximum number of records it contains except for the Normal Class (code 01). For this class try out number of records 100, 90, 80, and 70. Toss a biased coin to generate indices between 1 and number of records (rows) in the ratio 70:30. Generate training and test set using above indices. 8. Call Random Forest with imputed data and number of tree = 500 and other parameters. 9. Call Predict function on the test set of data. 10. Identify the important variables according to the speciïŹed criterion (MeanDecreaseGini) at speciïŹed threshold value (Set equal to the Mean of MeanDecreaseGini). 11. Call Random Forest with important variables and training set of data and number of tree = 500 and other parameters. 12. Call Predict function on the test set of data. 13. Go back to step 7 until the list (100, 90, 80, 70) is exhausted. Table 1 shows the computation of precision and recall, which can be deïŹned below as follows: recall ÂŒ Precision: the number of correctly classiïŹed examples of a particular class divided by the number of examples labeled by the system as belonging to that particular class.10 precision ÂŒ jfcorrect À labelsg fpredicted À labelsgj jcorrect À labelsj F-score: a combination of the above two measures in the form of harmonic mean. F-Score ÂŒ jfcorrect À labelsg fpredicted À labelsgj jpredicted À labelsj 2 · precision · recall precision ĂŸ recall As the system keeps operating in the ïŹeld, more records for the various cases will be collected, together with the cardiologists’ decisions for the respective records. A new RF classiïŹer may be trained with these data and ïŹnally it can be Recall (sensitivity): the number of correctly classiïŹed examples of a particular class divided by the number of examples of that particular class in the data. Table 1: Precision/Recall Computation Number of records Class 1 (precision, recall, Class major class f-score as deïŹned below). 2 90 80 70 180BD 96.43 58.69 72.97 78.26 52.94 63.15 89.29 78.12 83.33 89.47 65.38 75.55 Class 3 Class 4 66.67 100.0 50.0 71.40 75.0 100.0 68.97 85.71 66.7 72.73 75.0 33.33 53.33 50.0 50.00 61.54 60.0 39.98 71.43 66.67 50.0 55.55 100 50.0 62.53 80.0 50.0 75.00 0.00 83.33 70.59 0.00 83.33 72.73 0.00 83.33 Class 5 Class 6 Class 9 33.33 100.0 49.96 33.33 100.0 49.96 50.0 100.0 66.67 33.33 100.0 49.96 85.71 75.00 79.99 87.5 87.5 87.5 75.00 90.00 81.82 80.00 88.80 84.17 100.0 100.0 100.0 66.67 100.0 79.99 100.0 100.0 100.0 100 100 100 Class With all variables With important variables 10 100-OOB-error 100-OOB-error 62.50 66.66 64.53 80.00 54.54 64.86 55.56 83.33 66.71 86.67 68.42 76.47 67.29 70.10 71.01 72.46 64.50 66.50 61.66 64.25 BIG DATA SEPTEMBER 2013
  • 6. INDUSTRY EXPERIENCE Agneeswaran et al. combined with the one currently operating incrementally using the combine() function of randomForest. Implementation of R-based ClassiïŹer for Real-Time Analysis ‘‘R’’ code can be executed from within a bash script, which allows us to invoke it from within a Java program (or any programming language or script for that matter). Storm is an open-source real time computation framework, which allows us to process streams of data in a parallel fashion making it a very good choice for classiïŹcation of data on a cluster of nodes. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation. The model ïŹle created in the previous step is referenced in another ‘‘R’’ script, which is used for real-time classiïŹcation. Data to run classiïŹcation on enters the storm framework via a Spout which then emits it to the bolts. Each bolt runs ‘‘R’’ script in parallel and emits results of the classiïŹcation (which can get captured and used as needed) as shown in Fig. 3. Note that for each result in Table 2, one node is a Nimbus node and the remaining are supervisors. Each node is an 8—has 8 quad-core CPUs, 32 GB of RAM, and 32 GB of swap space. Table 2. ECG ClassiïŹcation Performance Analysis Time taken (in seconds) Number of predictions (ECG categorizations) 20K 40K 0.1 million Sequential processing (no-Storm used) Storm cluster with 2 nodes (1 spout, 8 bolts) Storm cluster with 3 nodes (1 spout, 16 bolts) 3,600 7,200 18,300 900 1,710 4,440 450 900 2,400 Note: We made use of only one Spout for this POC. Depending on the mechanisms of data entry into Storm framework, it is possible to use multiple spouts, which would enhance performance further. Concluding Remarks This article has presented a real-time machine-learning platform for the healthcare domain that allows ECG signals to be classiïŹed. It is an additional input for the physician, but a crucial one that facilitates care-for-value. The implication is that this work provides the basis for building a powerful analytical framework that can work in real-time—this study could prove extremely useful, not only for ECG classiïŹcation, but also for enabling physicians to get incremental analytics on various kinds of patient data increasingly available in the EHRs. Our study also enables incremental healthcare, where the focus can shift to analytics, and consequently, to customized real-time healthcare. The upcoming health exchanges may also beneïŹt, as on-the-ïŹ‚y analytics on highvelocity data becomes essential for providers, physicians, and patients equally. Author Disclosure Statement All authors are employed by Impetus. References FIG. 3. Running R over Storm. MARY ANN LIEBERT, INC. VOL. 1 NO. 3 SEPTEMBER 2013 BIG DATA 1. Dayong Gao, Madden M, Chambers D, Lyons G. Bayesian ANN classiïŹer for ECG arrhythmia diagnostic system: A comparison study. Proceedings of 2005 IEEE International Joint Conference on Neural Networks (IJCNN ’05) 2005; 4:2383–2388. 2. Rothman SA, et al. The diagnosis of cardiac arrhythmias: A Prospective multi-center randomized study comparing mobile cardiac outpatient telemetry versus standard loop event monitoring. J Cardiovasc Electrophysiol 2007; 8:1–7. 3. Guvenir HA, Acar S, Demiroz, G, Cekin A. A supervised machine learning algorithm for arrhythmia analysis. Comput Cardiol 1997;7:433–436. BD181
  • 7. ARRHYTHMIA DETECTION: REAL TIME ANALYTICS Agneeswaran et al. 4. Breiman L. Random Forests. Mach Learn 2001; 45:5–32. 5. Liaw A. Missing value imputations by randomForest. R documentation. Available online at http://rss.acs.unt .edu/Rdoc/library/randomForest/html/rfImpute.html. (Last accessed on September 6, 2013). 6. Ishioka T. Imputation of missing values for unsupervised data using the proximity in random forests. In: Proceedings of The Fifth International Conference on Mobile, Hybrid, and On-line Learning. Nice, France, February 24–March 1, 2013. 7. Honaker J, King G, Blackwell M. AMELIA II: A program for missing data. J Stat Softw 2011; 45:1–47. 8. Blagus R, Lusa L.SMOTE for high-dimensional classimbalanced data. BMC Bioinformatics 2013; 14:106. Available online at www.biomedcentral.com/1471-2105/ 14/106. (Last accessed on September 6, 2013). 182BD 9. Calle ML, Urrea V. Letter to the editor: Stability of random forest importance measures. BrieïŹngs Bioinf 2011; 1286–89. 10. Solokova M, Guy L. A systematic analysis of performance measures for classiïŹcation tasks. Inf Process Manag 2009; 45:427–437. Address correspondence to: Vijay Srinivas Agneeswaran, PhD Innovation Labs Impetus Infotech India Private Limited Pritech Park SEZ, Bellandur Outer Ring Road Bangalore, Karnataka 560103 India E-mail: vijay.sa@impetus.co.in BIG DATA SEPTEMBER 2013