IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
1. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
An Automated Weighted Support Approach Based Associative
Classification With Analytical Study For Health Disease
Prediction
*Mrs. Suwarna Gothane, **Dr. G.R.Bamnote
*Asst Prof (CSE)ATRI, Uppal. Hyderabad
**HOD(CSE)P.R.M.I.T&R, Badnera, Amravati
Abstract
Association mining has seen its system throughput the doctor may decide the
development right through data mining ,which medication [6].
reveal implicit relationship among data attributes
It is the research topic in data mining and use for 2. ASSOCIATION CLASSIFICATION:
applications such as business, science, society and This method of mining is an algorithm based
everyone .Association rule helps in marketing, technique with support of some association
decision making, business management ,cross constraints to repeat the data. The data available is
marketing ,catalog designing ,clustering disjointed into 2 parts.one of which is 70% of total
,classification etc. Finding small database sets data and is taken as training data and the rest taken as
could be done with the help of predictive analysis another part is used for testing purpose. This
method. The paper enlightens the combinational technique of data mining is done on data base sets
categorization of association and classification with a record of information.
data mining. For fields like medicine where a lot Step1: with the help of training set of data create
several patients consult doctor, but every patient an association rule set.
has got different personal details not necessarily Step2: eliminate all those rules that may cause
may suffer with same disease. So the doctor may over fitting.
look for a classifier, which could present all details Step3: finally we predict the data and check for
about every patient and in future necessary accuracy and this is said to be the classification
medications can be provided. However there have phase.
been many other classification methods like One such example of data base set is:
CMAR, CPAR MCAR and MMA and CBA.
Some advance associative classifiers have also seen
development very recently with small
amendments in terms of support and confidence,
thereby the accuracy. In this paper ,we proposed a Figure 1: Process Flow diagram of Associative
HITS algorithm based on automated weight Classifier
calculation approach for weighted associative Transaction Items in
classifier. ID Transaction
Keywords: classifier, Association rules, data 1 ABCDE
mining, healthcare, Associative Classifiers, CBA, 2 ACE
CMAR, CPAR, MCAR 3 BD
4 ADE
1. INTRODUCTION:
A set of steps followed to take out data from 5 ABCDE
the related pattern is termed as data mining. From a
haphazard data set it is possible to obtain new data. Table 1: Transactional Database
The analytical modeling approach that simply An association rule is an implication of the
combines association and classification mining form A B , where A, B I , where I is set of
together [3] shows better accuracy [10]. The
all items, and A B . The rule A B has a
classification techniques CBA [10], CMAR [9],
CPAR [8] out beat the usual classifiers C4.5, FOIL, support s in the transaction set D if s% of the
RIPPER which are faster but not accurate. Associate transactions in D contain A B .The rule A B
classifiers are fit to those model applications which holds in the transaction set D with confidence c if c%
provide support domains in the decisions. However of transactions in D that contain A also contain B . if
the most matched example for this is medical field the threshold point is crossed in terms of confidence
where in the data for each patient is required to be then the association rules could be determined. Thus
stored, with the help of which the system predicts the
diseases likely to be upsetting the patient. With the
458 | P a g e
2. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
the determined rules form a confidence frame with efficient and friendly for end user. Whenever any
the help of high strength rules. amendments need to be done in a tree they can be
On a particular set of domains the AC is made without disturbing the other attributes.
performed. A tuple is a collection of m attributes
a1 , a2 ,...., am and consists of another special 3. ADVANCEMENTS IN CAR RULE
predictive Attribute (Class Label). However these GENERATION:
attributes can be categorized depending action The accuracy of the classification however
methods. Association rule mining is quite different depends on the rules disguised in the classification.to
from AC but still undergo following similarities: overcome CARs rules inaccuracy in some cases, a
i. Attribute which is pair (attribute, value), is used new advanced ARM in association with classifiers
in place of Item. For example (BMI, 40) is an has been developed. This new advanced technique
attribute in Table 2 provides high accuracy and also improves forecast
ii. Attribute set is equivalent to Itemset for example capabilities.
((Age, old), (BP, high)).
3.1. AN ASSOCIATIVE CLASSIFIER BASED
iii. Support count of Attribute ( Ai , vi ) is number of ON POSITIVE AND NEGATIVE
rows that matches Attribute in database. APPROACH:
iv. Support count of Attribute set Negative association rule mining and
( Ai , vi ),...., ( Am , vm ) is number of rows that associative classifiers are two relatively new domains
of research, as the new associative amplifiers that
match Attribute set in data base.
take advantage of it. The positive association rule of
v. An Attribute ( Ai , vi ) passes the minsup entry if
the form X Y to X Y , X Y and
support count ( Ai , vi ) min sup . X Y with the meaning X is for presence and
Heart X is for absence. Based on correlation analysis the
Recor Ag Smok Hypertensi Disea algorithm uses support confidence in its place of
d ID e es on BMI se using support-confidence framework in the
1 42 YES YES 40 YES association rule generation. Correlation coefficient
measure is added to support confidence framework as
2 62 YES NO 28 NO it measures the strength of linear relationship
3 55 NO YES 40 YES between a pair of two variables. For two variables X
con( X , Y )
and Y it is given by
4 62 YES YES 50 YES
, where
5 45 NO YES 30 NO x, y
con( X , Y ) represents the covariance of two
variables and x stand for standard difference.
Table 2: Sample Database for heart patient.
vi. An Attribute set (( Ai , vi )....( Am , vm )) passes The range of values for is between -1 to +1,when
the min sup entry if support count it is +1 the variables are completely correlated, if it is
-1 the variables are completely independent then
(( Ai , vi ), ( Ai 1 , vi 1 )....( Aj , v j )) min sup . equals to 0.when positive and negative rules are used
vii. CAR Rules are of form for classification in UCI data sets hopeful results will
(( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j )) c where c obtain. Negative association rules are efficient to take
out hidden knowledge. And if they are only used for
Class-Label. Where Left hand side is itemset classification, accuracy decreases.
and right hand sigh is class. And set of all
attribute and class label together ie ((Ai, vi), 3.2. TEMPORAL ASSOCIATIVE
…,(Aj, vj),c) is called rule attribute. CLASSIFIERS:
viii. Support count of rule attribute As data is not always static in nature, it
(( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c) is number changes with time, so adopting chronological
of rows that matches item in database. dimension to this will give more practical approach
Rule attribute and yields much better results as the purpose is to
provide the pattern or relationship among the items in
(( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c) passes the
time domain. For example rather than the basic
min sup entry if support count of association rule of bread butter mining
(( Ai , vi ), ( Ai 1 , vi 1 )....( A j , v j ), c) min sup from the chronological data we can get that the
An important subset of rules called class
association rules (CARs) are used for classification
support of bread butter raises to 50%
purpose since their right hand side is used for during 7 pm to 10 pm everyday [3],as These rules are
attributes. Its simplicity and accuracy makes it more informative they are used to make a strategic
459 | P a g e
3. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
decision making. Time is an important aspect in “Income [20-50k$] Age [20 -30]”, and so on
temporal database. [ZC08]. As the record belongs to only one of the set
1. Scientific, medical, dynamic systems, computer results in sharp boundary problem which gives rise to
network traffic, web logs, markets, sales, the notion of fuzzy association rules (FAR).The
transactions, machine/device performance, semantics of a fuzzy association rule is richer and
weather/climate, telephone calls are examples of natural language nature, which are deemed attractive.
time ordered data. For example, “low-quantity Fruit normal -
2. The volume of dynamic time-tagged data is consumption Meat” and “medium Income
therefore growing, and continuing to grow, young Age” are fuzzy association rules, where X’s
3. as the monitoring and tracking of real-world and Y’s are fuzzy sets with linguistic terms (i.e., low,
events frequently require frequent normal, medium, and young).An associative
measurements.3.Data mining methods require classification based on fuzzy association rules
some alteration to handle special temporal (namely CFAR) is proposed to overcome the “sharp
relationships (“before”, “after”, “during”, “in boundary” problem for quantitative domains. Fuzzy
summer”, “whenever X happens”). rules are found to be useful for prediction modeling
4. Time-ordered data lend themselves to prediction system in medical domain as most of the attributes
– what is the likelihood of an event, given the are quantitative in nature hence fuzzy logic is used to
previous history of events? (e.g., hurricane deal with sharp boundary problems.
tracking, disease epidemics) 3.3.1 Defining Support and confidence measure: New
5. Time-ordered data often link certain events to formulae of support and confidence for fuzzy
specific patterns of temporal behavior (e.g., classification rule F C are as follows:
network intrusion breaks INS).The new type of Sum of membership values of antecedent with class label C
sup port ( F C )
AC called Temporal Associative Classifier is Total No. of Records in the Database
being proposed in [3] to deal with above such
state. CBA, CMAR and CPAR are customized Sum of membership values of antecedent with class label C
confidence( F C )
Sum of membership values of antecedent for all class label
with temporal measurement and proposed
TCBA, TCMAR and TCPAR. To compare the
classifying accuracy and implementation time of 3.4 WEIGHTED ASSOCIATIVE CLASSIFIERS
the three algorithms using temporally Based on the unlike features weights are
customized data set of UCI machine learning selected based on this classifier. Every attribute
data sets an experiment has performed and varies in terms of importance. it also important to
conclusions are: know that with the capabilities of predicting the
i. TCPAR performs better than TCMAR and weights may be altered. Weighted associative
TCBA as it is time consuming for smaller classifiers consist of training dataset
support values but improves in run- time T r1,r2, r3. ri with set of weight associated
performance as the support increases. with each attribute, attribute value pair. Each with
ii. Using data set the accuracy is considered for
record is a set of attribute value and a weight wi
each algorithm. The average accuracy of TCPAR
attached to each attribute of tuple / record. Aweighted
is found little better than TCMAR.
framework has record as a triple {ai, vi, wi} where
iii. The temporal complement of all the three
attribute ai is having value vi and weight wi,
associative classifiers has shown better
0<wj<=1. Thus with the help of weights one can
classification accuracy as compare to the non-
easily find out its predicting ability. With this
temporal associative classifier. Time-ordered
weighted rules like “medium Income young Age” ,
data lend themselves to prediction like what is
“{(Age,”>62”), (BMI,“45”), (Boold_pressur,“95-
the likelihood of an event e.g., (hurricane
135”)}, Heart Disease,(Income[20,000-
tracking, disease epidemics). The temporal data
30,000]Age[20 -30]) could become the criteria of
is useful in predicting the disease in different age
determination. Weights of data as per table 2 are
group.
recorded in table 4 with the help of weights
mentioned in the table 5 for predicting attributes.
3.3. ASSOCIATIVE CLASSIFIER USING
FUZZY: Recor
Association Rule: The quantitative attributes d
are one of preprocessing step in classification. for the Recor Ag Smok Hypertensi BM Weig
data which is associated with quantitative domains d ID e es on I ht
such as income, age, price, etc., in order to apply the 1 42 YES YES 40 0.6
Apriori-type method association rule mining 2 62 YES NO 28 0.42
requirements to separation the domains. Thus, a
discovered rule X Y reflects association 3 55 NO YES 40 0.52
between interval values of data items. Examples of 4 62 YES YES 50 0.67
such rules are “Fruit [1-5kg] Meat [520$]”, - 5 45 NO YES 30 0.45
460 | P a g e
4. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
Table 4: Relational Database with record weight weights represent the potential of transactions to
Here we measure the record weight using fallowing contain high-value items. A transaction with few
|I | items may still be a good hub if all component items
w i are top ranked. Conversely, a transaction with many
recordWeight i 1
ordinary items may have a low hub weight.
|I|
Here 3.4.1.2. W-support - A New Measurement:
| I | is total number of items in record Item set estimation by support in classical
i is an item in record association rule mining [1] is based on counting. In
wi is weight of the item i this section, we will introduce a link-based measure
called w-support and formulate association rule
3.4.1 Measuring weights using HITS algorithm mining in terms of this new concept.
3.4.1.1. Ranking Transactions with HITS The previous section has established the application
A database of transactions can be depicted as a of the HITS algorithm [3] to the ranking of the
bipartite graph without loss of information. Let transactions. As the iteration converges, the authority
D {T1 , T2 ,...Tm } be a list of transactions and weight auth(i ) hub(T ) represents the
I {i1 , i2 ,...in } be the equivalent set of items. T :iT
“significance” of an item I, accordingly, we
Then, clearly D is equivalent to the bipartite graph
generalize the formula of auth(i ) to depict the
G ( D, I , E ) where significance of an arbitrary item set, as the following
E {(T , i) : i T , T D, i I } definition shows:
Definition 1: The w-support of an item set X is
defined as
hub(T )
w sup p( X ) T : X T T D ....(2)
hub(T )
T :T D
Where hub(T ) is the hub weight of
Fig1: The bipartite graph illustration of a database (a)
Database (b) Bipartite graph transaction T. An item set is said to be significant if
its w-support is larger than a user specified value
Example 1: Consider the database shown in Fig. 1a. Observe that replacing all hub(T ) with 1 on the
It can be consistently represented as a bipartite graph, right hand side of (2) gives supp(X). Therefore, w-
as shown in Fig. 1b. The graph representation of the support can be regarded as a generalization of
transaction database is inspiring. It gives us the idea support, which takes the weights of transactions into
of applying link-based ranking models to the account. These weights are not determined by
estimation of transactions. In this bipartite graph, the assigning values to items but the global link structure
support of an item i is proportional to its degree, of the database. This is why we call w-support link
which shows again that the classical support does not based. Moreover, we claim that w-support is more
consider the variation between transactions. reasonable than counting-based measurement.
However, it is critical to have different weights for
different transactions in order to return their different ID Symptoms Weight
importance. The assessment of item sets should be 1 Age 40 0.1
derivative from these weights. Here comes the
question of how to acquire weights in a database with 2 40 Age 58 0.2
only binary attributes. Intuitively, a good transaction, 3 Age 58 0.3
which is highly weighted, should contain many good
items; at the same time, a good item should be 4 Smokes yes 0.8
contained by many good transactions. The 5 Smokes no 0.7
reinforcing relationship of transactions and items is
6 Hypertension yes 0.6
just like the relationship between hubs and authorities
in the HITS model [3]. Regarding the transactions as 7 Hypertension no 0.5
“pure” hubs and the items as “pure” authorities, we 8 BMI 25 0.1
can apply HITS to this bipartite graph. The following
equations are used in iterations: 9 26 BMI 30 0.3
auth(i ) hub(T ), hub(T ) auth(i )....(1) 10 31 BMI 40 0.5
T :iT i:iT
11 BMI 40 0.8
Table 5: weights calculated using anticipated
When the HITS model finally converges, the hub
weights of all transactions are obtained. These algorithm
461 | P a g e
5. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
3.4.2 Defining Support and confidence measure: CONCLUSION:
New formulae of support and confidence for This advanced AC method could be applied
classification rule X class _ Label , where X is in real time scenario to get more accurate results.
set of weighted items, is as follows: Weighted This needs lot of prediction to be done based on its
Support: Weighted support WSP of rule capabilities which could be improved.It find it major
application in the field of medical where every data
X class _ Label , where X is set of non empty
has an associated weight. The proposed HITs
subsets of attribute value set, is part of weight of the algorithm based weight measurement model is
record that contain above attribute-value set significantly improving the quality of classifier.
comparative to the weight of all transactions.
|X |
v1 weight (ri ) REFERENCES:
i 1 [1]. SunitaSoni ,JyothiPillai,O.P. Vyas An
|n| Associative Classifier Using Weighted
v 2 weight (ri ) Association Rule2009 International
k 1 Symposium on Innovations in natural
v1 Computing, World Congress on Nature &
WSP( X Class _ Label )
v2 Biologically
The classification accuracy development for [2]. Zuoliang Chen, Guoqing Chen BUILDING
Weighted associative classifier with proposed weight AN ASSOCIATIVE CLASSIFIER BASED
measurement approach can be observable in fig 2 and ON FUZZY ASSOCIATION RULES
fig 3. International Journal of Computational
Fig 2: A line chart demonstration of classification Intelligence Systems, Vol.1, No. 3 (August,
accuracy differences between Traditional pre 2008), 262 - 273
assigned weighted associative classifier(PAWS- [3]. RanjanaVyas, Lokesh Kumar Sharma, Om
Associative classifier) and proposed automated Prakashvyas, Simon ScheiderAssocitive
weighted support associative classifier(AWS- Classifiers for Predictive analytics:
Associative Classifier) Comparative Performance Study, second
UKSIM European Symposium on Computer
Modeling and Simulation 2008.
[4]. E.RamarajN.VenkatesanPositive and
Negative Association Rule Analysis in
Health Care Database ,IJCSNS International
Journal of Computer Science and Network
Security, VOL.8 No.10, October 2008,325-
330.
[5]. Khan, M.S. Muyeba, M. Coenen, F A
Weighted Utility Framework for Mining
Association Rules, Symposium Computer
Modeling and Simulation, 2008. EMS '08.
Second UKSIM European , page(s): 87-92.
[6]. FadiThabtah, A review of associative
4. REFINING SUPPORT AND classification mining, The Knowledge
Engineering Review, Volume 22, Issue 1
CONFIDENCE MEASURES TO
(March 2007), Pages 37-65, 2007.
VALIDATE DOWNWARD CLOSURE [7]. LuizaAntonie, University of Alberta,
PROPERTY: Advancing Associative Classifiers -
The downward closure property is the key Challenges and Solutions, Workshop on
part of Apriori algorithm.it states that any super set Machine Learning, Theory, Applications,
can’t be frequent unless and until its itemset isn’t Experiences 2007
frequent. The itemsets that are already found to be [8]. Feng Tao, FionnMurtagh and Mohsen Farid.
frequent are added with new items based on the Weighted Association Rule Mining using
algorithm. However changes in support and Weighted Support and Significance
confidence shall not show its result on this property Framework Proceedings of the ninth ACM
and also AC associated with advanced rule SIGKDD international conference on
developer. The terms support and confidence are to Knowledge discovery and data mining 2003,
be replaced with weighted support and weighted Pages:661-666 Year of Publication: 2003
confidence respectively in WAC which elicits that [9]. Yin, X. & Han, J. CPAR: Classification
weighted support helps maintain weighted closure based on predictive association rule. In
property. Proceedings of the SIAM International
462 | P a g e
6. Mrs. Suwarna Gothane, Dr. G.R.Bamnote / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.458-463
Conference on Data Mining. San Francisco,
CA: SIAM Press, 2003,pp. 369-376.
[10]. W. Li, J. Han, and J. Pei. CMAR: Accurate
and efficient classification based on
multiple class association rules. In
ICDM'01, pp. 369-376, San Jose, CA,
Nov.2001.
[11]. Liu,B Hsu. W. Ma, Integrating Clasification
and association rule mining .Proceeding of
the KDD, 1998(CBA) pp 80-86.
[12]. Cláudia M. Antunes , and Arlindo L.
Oliveira Temporal Data Mining: an
overview.
[13]. Antonie, M. &Zaïane, O. An associative
classifier based on positive and negative
rules. In Proceedings of the 9th ACM
SIGMOD Workshop on Research Issues in
Data Mining and Knowledge Discovery,
2004,pp 64-69.
463 | P a g e