SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3,(IJCET)
                            & TECHNOLOGY October-December (2012), © IAEME
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 3, Issue 3, October - December (2012), pp. 265-272
                                                                                IJCET
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2012): 3.9580 (Calculated by GISI)                   ©IAEME
www.jifactor.com




        SIGNIFICANCE OF INTEGRATED TAXONOMY APPROACH IN
                      DIVERSE LIVER CHAOSES
                         A.S.Aneeshkumar1, Dr. C. Jothi Venkateswaran2
                    1
                     Research Scholar, PG & Research Dept. of Computer Science,
                  Presidency College, Chennai, India, aneesh_kumar777@yahoo.com
              2
                Research Supervisor &Dean, Dept. of Computer Science & Applications,
                 Presidency College, Chennai, India, jothivenkateswaran@yahoo.co.in

     ABSTRACT

             Liver chaos is an associated factor with other serious diseases like cancer, kidney
     failure and cardio vascular problems. The preliminary stage of liver dysfunction can be
     classified as three, which are Hepatitis, Alcoholic Fatty Liver Disease (AFLD) and Non-
     alcoholic Fatty Liver Disease (NAFLD). In this paper we are applying various integrated
     classification models of Data Mining for the evaluation of performance.

     Keywords: Associative Classification, Classification Based on Association, Classification
     based on Multiple Association Rules, Classification based on Predictive Association Rules,
     First Order Inductive Learner, Preprocessing.

I.          INTRODUCTION


             Many health care organizations having large number of health related data without
     proper analysis or storage. In a successful organization, it is import to empower the staff and
     management with data warehousing based on critical thinking and knowledge management
     tools for strategic decision making [1]. Medical diagnosis is regarded as an important yet
     complicated task that needs to be executed accurately and efficiently. Regrettably all doctors
     do not possess expertise in every subspecialty fields and moreover there is a shortage of
     resource persons in certain places. Therefore, an automatic medical diagnosis system would
     probably be exceedingly beneficial by bringing all of them together [2].
             Determining the early stages of liver disease is better to avoid other consequences.
     The common liver related problems seen in Indian society are Hepatitis (Viral attack),
     Alcoholic Fatty Liver Disease (AFLD) and Non-alcoholic Fatty Liver Disease (NAFLD).
     Hepatitis can be broadly classified as HAV, HBV, HCV, HDV, and HEV. The first three are

                                                  265
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
  6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

  very common in anywhere, but remaining two is seen very rarely and therefore such data are
  not available for this study. Normally, it is difficult to identify the exact reason of Liver
  dysfunction in some extend without further examinations and so in this work we are
  developing a model to predict the accurate type of disease without false assumptions.
          Human civilization since the time of immemorial to today, the alcohol is ubiquitous,
  with constantly changing patterns of alcohol intake around the world [3]. High level of
  continuous alcoholic consumption will lead to AFLD. Modern life style and lack of exercise
  are associated with an increased prevalence of diabetes mellitus (DM), Obesity,
  Hypertension, hyper triglycerides and which are calculated as major reasons of NAFLD.
  Hepatitis A may be spread primarily through food or water from an infected person and
  others are through sexual contact with an infected person, injection of drugs, infected women
  to infants and blood transfer. In some cases toxic drug effects will also be the factor of
  deviated results of the liver function tests and theses are seen in rare cases. Some hepatitis is
  considered to have autoimmune hepatitis, on the basis of combination of psychological and
  histological features, clinical and biochemical features as described by the International
  Autoimmune Hepatitis Group Report [4].
          Radiologic imaging with Ultra-Sonography (US), Computed Tomography (CT) or
  Magnetic Resonance Imaging, used either singly or in combination, for detection of adequate
  threshold of fatty infiltrations in the liver. Then in the absence of obvious contraindications, a
  liver biopsy should be conducted in patients with unexplained sustained abnormalities of the
  liver biochemistry [5]. The greatest benefit of liver biopsy in unexplained abnormal LFT is to
  cover progressive liver disease either directly or through prompting a further diagnostic
  analysis.

II.        DATA DESCRIPTION
           The collected data for this study from a reputed hospital consists of all symptoms,
  doctor’s observation and liver biochemistry of the patients, where in addition to Liver
  Function Test (LFT), Triglycerides report also included for this study. Symptoms of Liver
  problems are uncertain in most cases. So we neglect such items and the following factors are
  the attributes for this work.

  TABLE 1. Biochemical Markers of LFT

                                  Lower         Upper
          Test Items
                                  Limit         Limit
          Bilirubin(D)              0.1          0.4
          Bilirubin(T)              0.3          1.3
            S.G.O.T                 12           38
            S.G.P.T.                 7           41
      Alkaline Phosphatase          80           290
          Gamma GT                 11/ 7        50/32
         Total Proteins             6.7          8.6
            Albumin                 4.1          5.3

                                                 266
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
   6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

            Globulins                2.0           3.5

   TABLE 2. Other dependent factors

              HBAIC                  Yes           No
              Obesity                Yes           No
                 BP                  Yes           No
           Triglycerides             Yes           No
       Alcoholic consumption         Yes           No


III.        METHODOLOGY
            The combination of knowledge discovery, information retrieval, deductive learning
   and exploratory data analysis can be called as Data mining [6]. Different algorithms are
   involved in it to extract models or patterns for particular prediction. The word Association in
   Mining is finding frequent item sets and discovery of association or correlation among such
   frequent items in a huge transaction. Another data analysis method called Classification is
   used to identify some important entities for prediction, where it extracts different models for
   better grouping of similar data types.
            In recent years the data mining researchers proposed a new integrated predictive
   approach for better accuracy is known as associative classification (AC), which combines
   associative mining and classification. According to most of the studies, we can see that, it has
   better accuracy and rule evaluation when compare to other classifications and rule based
   classification techniques. Because some rules used in AC will not appear in basic
   classification techniques. AC follows the method of classification in associative rule
   evaluation and the major application of this is in medical and engineering field. Medical
   practitioners are always following an inherent approach of combinational symptom and
   report based diagnosis for disease identification. Normally AC performs its analysis in
   relational database and each entity consists of attributes like ܽଵ , ܽଶ , ܽଷ , … … …	ܽ௡ and one of
   them will consider as Class label or Predictive attribute [7]. Rule generation, rule pruning and
   classification are the three steps of AC. The first AC algorithm introduced as Classification
   Based on Association (CBA) and it is based on the Apriori algorithm, which gives Class
   Associated Rule (CAR). These rules will pruned and then select most appropriate rules from
   it, for further classification. The CBA will mines all CARs and produce a classification from
   generated CARs, then it calculate the accuracy of the Association Classifier [8]. Multiple
   scan features of Apriori produce large number of rules and which maintain large
   computational time.
            Classification based on Multiple Association Rules (CMAR) and Classifications
   based on Predictive Association Rules (CPAR) are introduced later as the extension of CBA.
   The CMAR implements FP-growth algorithm instead of Apriori for the generation of
   frequent item sets [9]. It first generate all the association rules with certain support and
   confidence thresholds and then selects a small set of rules from them to form a classifier.
   Then after, it identifies the class label for the best rule with best confidence. When compare

                                                  267
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
  6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

  with previous classification it uses multiple rules in prediction, with the help of weighted	ܺ ଶ
  and so CMAR provides more accuracy in classification.
           Classification based on Predictive Association Rules combines the advantages of both
  associative classification and traditional rule-based classification to attain maximum
  accuracy. It extends FP growth with interestingness and overlapping relationships in
  formation of rules. CPAR generates a smaller set of rules, with higher quality and lower
  redundancy in comparison with other associative classification techniques. So CPAR is much
  efficient in case of rule generation, prediction and accuracy. More over than that CPAR
  generates a small set of predictive rules directly from the dataset based on the rule prediction
  and coverage analysis with lesser time, which is opposite to generating candidate rules like in
  other Associative Classifications, where datasets contain a large number of rows and
  columns. So they needs more time to build the model. It follows dynamic programming
  pattern to avoid repeated calculation in rule generation and select the entire close to the best
  literals instead of selecting only the best literal only, so the important rules will not lost.

IV.      DATA PREPROCESSING
         To avoid certain inappropriate, duplicate, unrelated and missing fields it is necessary
  to implement preprocessing techniques. In our approach, we used dimensionality reduction to
  avoid patient’s psychodynamic factors and symptoms, because it may show a discrepancy
  according to the type and variation of Liver disorder. In our previous work, we used Principal
  Component Analysis (PCA) for this reduction. It searches for ݇	݊-dimensional orthogonal
  vectors, which can bitterly represent the data, where ݇ ≤ ݊. the original data are thus
  projected onto a much smaller space, while it results dimensionality reduction. PCA always
  combines the essence of attributes by creating an alternative, smaller set of variables [10].

V.       MODEL BUILDING
         In this paper, we build a model to evaluate the performances of four algorithms,
  where three are incorporated. Three separate and consolidate datasets are used here for this
  model construction.

 5.1.     FOIL
          First Order Inductive Learner is a sequential covering algorithm which learns first-
  order logic rules. The tuples of the class for learning rules are called positive tuples, where
  the remaining is called as negative tuples. The number of ‫	)݁ݒ݅ݐܽ݃݁݊(݁ݒ݅ݐ݅ݏ݋݌‬tuples
  covered by the rule ܴ1	is represented as ‫ )݃݁݊(ݏ݋݌‬and ‫ ݏ݋݌‬ூ (݊݁݃ ூ ) be the number of
  ‫ )݁ݒ݅ݐܽ݃݁݊(݁ݒ݅ݐ݅ݏ݋݌‬tuples covered by	ܴ ூ . Basic idea of CPAR observed the concept of
  FOIL [11]. The information gain of rules which provide high accuracy and maximum
  positive tuples are
                                          	‫ ݏ݋݌‬ூ                 ‫ݏ݋݌‬
                       ‫	 = ݊݅ܽܩ_ܮܫܱܨ‬      ூ + ݊݁݃ ூ
                                                    − ݈‫	2݃݋‬              								
                                      ‫ݏ݋݌‬                    ‫݃݁݊ + ݏ݋݌‬




                                                268
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

which build rules to distinguish positive tuples from negative tuples. In cases of multiple
classes, FOIL is applied to each class, where if a rule generates, the positive sample it covers,
will be removed until all positive tuples in the data set are covered.


5.2.    CBA
        CBA uses a heuristic method to construct the classifier, unlike Apriori with the
satisfaction of minimum support and minimum confidence level, where the rules are
organized according to decreasing precedence based on their confidence and support [12]. If
a set of rule with same antecedent, then the rule with the highest confidence is selected to
represent the set. When classifying a new tuple, the first rule satisfying the tuple is used to
classify it. The classifier also contains a default rule, having lowest precedence, which
specifies a default class for any new tuple that is not satisfied by any other rule in the
classifier [13]. The set of these rules make a decision set for this classifier.

5.3.    CMAR
        CMAR uses an enhanced FP-tree that maintains the distribution of class labels among
tuples, which satisfying each frequent itemset, and it combine rule generation together with
frequent itemset mining in a single step. For an example ܴ1 and ܴ2 are two rules, where
antecedent of 	ܴ1 is more general than that of ܴ2 and	ܿ‫ ,)2ܴ(݂݊݋ܿ ≥ )1ܴ(݂݊݋‬then ܴ2 is
pruned. That defines the rules with low confidence can be pruned is a more generalized
version with higher confidence exists [14].

5.4.    CPAR
       CPAR relaxes the FOIL application of removing positive samples by allowing the
covered tuples to remain there, but it reduces the weight. The resulting rule from each class in
this manner will merge to form the classifier rule set. But it produces fewer rules than other
CBA and CMAR.
For an example,

                  (1)            (2)            (3)

                                 (2)            (4)

                                                 (1)             (2)
                                                                 (1)           (3)



is generated for the following four rules.
                             ܴ1 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଶ = 2, ‫ܣ‬ଷ = 3)
                             ܴ2 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଷ = 2, ‫ܣ‬ଶ = 4)
                             ܴ3 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଶ = 1, ‫ܣ‬ସ = 2, ‫ܣ‬ଷ = 2)
                             ܴ4 = (‫ܣ‬ଵ = 1, ‫ܣ‬ସ = 1, ‫ܣ‬ଷ = 2, ‫ܣ‬ଶ = 1, ‫ܣ‬ହ = 3		)




                                                269
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
  6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

VI.       ANALYSIS AND RESULTS
          As whispered earlier, because of ambiguity, we applied data reduction technique in
  preprocessing stage to reduce the dimensions. Then the attributes collected from the patients
  are shown in Table 1 and 2 for this work in addition to a class diagnosis. In Table 1, we can
  see the accuracy / Time of the classifier, where the full data set is used as a training set to
  build the model and again test it with the developed replica. First we applied these algorithms
  to identify viral infections of liver from the general category. In the second step we applied
  another set of data for classification, which related to the details of AFLD and the third
  dataset of NAFLD performs with lesser time. Finally we applied the combination of these
  three data sets to evaluate the performance. According to these results the average time taken
  by each method is same in Hepatitis data and the highest accuracy given by CPAR only. In
  case of AFLD data, four algorithms share a common time of 0.02 seconds with almost same
  and fine accuracy. The performance of CMAR is somewhat better here. In the next step also
  CPAR having better performance with 99.0 / 0.0 rate. I Finally in case of combined data,
  when compare to other three FOIL shows poor performance in both cases of accuracy and
  time, where CPAR completed within 0.02 seconds of time with good result. Figure 1
  represents the performance of all these algorithms.
        TABLE 3. Evaluation of algorithms
        DATA TYPES             FOIL            CBA             CMAR              CPAR
            Hepatitis          88.26 / 0.04    91.70 / 0.05     91.98 / 0.04      92.0 / 0.04
             AFLD              98.08 / 0.02    98.09 / 0.02     98.40 / 0.02     98.10 / 0.02
            NAFLD              98.38 / 0.01    98.58 / 0.00     98.82 / 0.01      99.0 / 0.0
           Consolidate         94.0 / 0.04     97.27 / 0.04     97.28 / 0.02     97.62 / 0.02


                                                     FOIL
                                               100
                                                98
                                                96
                                                94
                                                92
                                                90
                                                88
                                                86
                                                84
                        CPAR                    82                         CBA




                                                 CMAR

                         Hepatitis infection     ALD          NALD      Consolidate

                                  Fig 1. Rule quality performance

                                                     270
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

VII. CONCLUSION
    The observed factors used for this model are very important for liver disease prevalence.
Most of the liver problems are presented because of the life style and behaviour of the
peoples. In this analysis accuracy shows the quality of the rule. The integration of multiple
rules gives more accurate results. As the result the overall performance of CPAR is better
than other classifiers. So this approach can be adapted to other application field like medical,
insurance, banking, recruitment and psychology for better prediction in the field of their
expected results.

REFERENCES

Journal Papers:
[1]   Shantakumar B. Patil and Y.S.Kumaraswamy, Intelligent and Effective heart Attack
      Prediction System Using Data Mining and Artificial Neural Network, European
      Journal of Scientific Research, ISSN 1450-216X Vol.31 No.4(2009), pp.642-656.
[2]   P.Rajeswari and G.Sophia Reena, Analysis of Liver Disorder using Data mining
      Algorithm, Global Journal of Computer Science and Technology, Vol.10 Issue14
      (Ver.10) November 2010, Page no.48.
[3]   Das SK, Balakrishnan V and Vasudevan DM, Alcohol: Its health and social impact in
      India, Natl Med J India 2006; 19:94-9.

Report:
[4]   International Autoimmune Hepatitis Group Report, J Hepatol 1999: 31:929-938.

Journal Papers:
[5]   Maeve M.Skelly, Peter D.James and Stephen D. Ryder, Findings on Liver biopsy to
      investgate abnormal liver function tests in the absence of diagnostic serology, Elsevier
      Journal of Hepatology 35(2001) 195-199.
[6]   Huda Yasin, Tahseen A. Jilani and Madiha Danish, Hepatitis-C Classification using
      Data mining Techniques, International Joural of Computer Applications (0957-8887),
      Volume 24-No.3, June 2011.
[7]   Sunita Soni and O.P.Vyas, Using Associative Classifiers for Predictive Analysis n
      Health Care Data Mining, International Journal of Computer Applications(0975-
      8887), Volume 4- No.5, July 2010.
[8]   Asha .T, S.Natarajan and K.N.B. Murthy, A Study of Associative Classifiers with
      Different Rule Evaluation Measures for Tuberculosis Prediction, IJCA Special Issue
      on Artificial Intelligence Techniques – Novel Approaches & Practical Applications,
      AIT-2011.

Proceedings Papers:
[9]   Xiaoxin Yin and Jiawei Han, CPAR: Classification based on Predictive Association
      Rules, Proceedings of the SIAM International Conference on Data Mining. San
      Francisco, CA: SIAM Press, 2003, pp:369-376.



                                              271
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME

Books:
[10] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Published
       by Elsevier, second edition – 2006.

Proceedings Papers:
[11] J. R. Quinlan and R. M. Cameron-Jones, FOIL: A Midterm Report, Proceedings of
      European Conference in Machine Learning, Vienna, Austria, 1993.
[12] Liu B., Hsu W., Ma Y., Integrating Classification and Association Rule mining,
      Proceedings of KDD, New York, pp.80-86, 1998.

Journal Papers:
[13] Z. Tang and Q. Liao, A new Class-based Associative Classification Algorithm,
      International Journal of Applied Mathematics, 2007

Proceedings Papers:
[14] Li W., Han J., Pie J., Accurate and Efficient Classification Based on Multiple Class-
      Association Rules, proceedings of IEEE International Conference on Data Mining,
      Washington DC, USA, pp. 369-380, 2001.




                                           272

Weitere ähnliche Inhalte

Was ist angesagt?

The Relationship between the usage of Drugs and Sport Performance
The Relationship between the usage of Drugs and Sport PerformanceThe Relationship between the usage of Drugs and Sport Performance
The Relationship between the usage of Drugs and Sport Performance
ijtsrd
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
Nimai Chand Das Adhikari
 
IJBET-4-Narasimham et al
IJBET-4-Narasimham et alIJBET-4-Narasimham et al
IJBET-4-Narasimham et al
Gaurav Kaila
 
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
HMO Research Network
 
Presentation to CCG - Capita Health Freakononics v3
Presentation to CCG - Capita Health Freakononics v3Presentation to CCG - Capita Health Freakononics v3
Presentation to CCG - Capita Health Freakononics v3
Mike Thorogood
 

Was ist angesagt? (11)

IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
 
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEPREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
 
Screening of Promising Lead Molecules against Two Drug Targets in Ebola Virus...
Screening of Promising Lead Molecules against Two Drug Targets in Ebola Virus...Screening of Promising Lead Molecules against Two Drug Targets in Ebola Virus...
Screening of Promising Lead Molecules against Two Drug Targets in Ebola Virus...
 
The Relationship between the usage of Drugs and Sport Performance
The Relationship between the usage of Drugs and Sport PerformanceThe Relationship between the usage of Drugs and Sport Performance
The Relationship between the usage of Drugs and Sport Performance
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
 
A.Muenzenmeyer_AACC_2016
A.Muenzenmeyer_AACC_2016A.Muenzenmeyer_AACC_2016
A.Muenzenmeyer_AACC_2016
 
system dynamics simulation model for cardiovascular heart disease risk
system dynamics simulation model for cardiovascular heart disease risksystem dynamics simulation model for cardiovascular heart disease risk
system dynamics simulation model for cardiovascular heart disease risk
 
Strive Teleconf Presentation Aug10 2005
Strive Teleconf Presentation Aug10 2005Strive Teleconf Presentation Aug10 2005
Strive Teleconf Presentation Aug10 2005
 
IJBET-4-Narasimham et al
IJBET-4-Narasimham et alIJBET-4-Narasimham et al
IJBET-4-Narasimham et al
 
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
Multi Institutional Cohort to Facilitate Cardiovascular Disease Biomarker Val...
 
Presentation to CCG - Capita Health Freakononics v3
Presentation to CCG - Capita Health Freakononics v3Presentation to CCG - Capita Health Freakononics v3
Presentation to CCG - Capita Health Freakononics v3
 

Andere mochten auch (7)

Reefer Vehicle Redressal Program
Reefer Vehicle Redressal ProgramReefer Vehicle Redressal Program
Reefer Vehicle Redressal Program
 
Early detection of adult valve disease mitral stenosis
Early detection of adult valve disease mitral stenosisEarly detection of adult valve disease mitral stenosis
Early detection of adult valve disease mitral stenosis
 
Patent data clustering a measuring unit for innovators
Patent data clustering a measuring unit for innovatorsPatent data clustering a measuring unit for innovators
Patent data clustering a measuring unit for innovators
 
Logistic Times Feature - column
Logistic Times Feature - columnLogistic Times Feature - column
Logistic Times Feature - column
 
Cold chain: a Debate
Cold chain: a DebateCold chain: a Debate
Cold chain: a Debate
 
Innovation in Supply Chain - Interview
Innovation in Supply Chain - InterviewInnovation in Supply Chain - Interview
Innovation in Supply Chain - Interview
 
Understanding enterprise risk management and fair
Understanding enterprise risk management and fairUnderstanding enterprise risk management and fair
Understanding enterprise risk management and fair
 

Ähnlich wie Significance of integrated taxonomy approach in diverse liver chaoses

Improving the performance of k nearest neighbor algorithm for the classificat...
Improving the performance of k nearest neighbor algorithm for the classificat...Improving the performance of k nearest neighbor algorithm for the classificat...
Improving the performance of k nearest neighbor algorithm for the classificat...
IAEME Publication
 
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
BRNSSPublicationHubI
 
50120140506011
5012014050601150120140506011
50120140506011
IAEME Publication
 

Ähnlich wie Significance of integrated taxonomy approach in diverse liver chaoses (20)

Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUESLIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
LIVER DISEASE PREDICTION BY USING DIFFERENT DECISION TREE TECHNIQUES
 
A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...
A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...
A Tentative analysis of Liver Disorder using Data Mining Algorithms J48, Deci...
 
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
 
Diabetes Prediction Using ML
Diabetes Prediction Using MLDiabetes Prediction Using ML
Diabetes Prediction Using ML
 
1 springer format chronic changed edit iqbal qc
1 springer format chronic changed edit iqbal qc1 springer format chronic changed edit iqbal qc
1 springer format chronic changed edit iqbal qc
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease Classification
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
 
Improving the performance of k nearest neighbor algorithm for the classificat...
Improving the performance of k nearest neighbor algorithm for the classificat...Improving the performance of k nearest neighbor algorithm for the classificat...
Improving the performance of k nearest neighbor algorithm for the classificat...
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
 
IRJET- Disease Analysis and Giving Remedies through an Android Application
IRJET- Disease Analysis and Giving Remedies through an Android ApplicationIRJET- Disease Analysis and Giving Remedies through an Android Application
IRJET- Disease Analysis and Giving Remedies through an Android Application
 
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
 
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
 
Diagnosis Of Chronic Kidney Disease Using Machine Learning
Diagnosis Of Chronic Kidney Disease Using Machine LearningDiagnosis Of Chronic Kidney Disease Using Machine Learning
Diagnosis Of Chronic Kidney Disease Using Machine Learning
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
IRJET- Chronic Kidney Disease Prediction based on Naive Bayes Technique
IRJET- Chronic Kidney Disease Prediction based on Naive Bayes TechniqueIRJET- Chronic Kidney Disease Prediction based on Naive Bayes Technique
IRJET- Chronic Kidney Disease Prediction based on Naive Bayes Technique
 
50120140506011
5012014050601150120140506011
50120140506011
 

Mehr von iaemedu

Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
iaemedu
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
iaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
iaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
iaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
iaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
iaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
iaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
iaemedu
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
iaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
iaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
iaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
iaemedu
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
iaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
iaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
iaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
iaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
iaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
iaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
iaemedu
 

Mehr von iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Prediction of customer behavior using cma
Prediction of customer behavior using cmaPrediction of customer behavior using cma
Prediction of customer behavior using cma
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 

Significance of integrated taxonomy approach in diverse liver chaoses

  • 1. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3,(IJCET) & TECHNOLOGY October-December (2012), © IAEME ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 3, Issue 3, October - December (2012), pp. 265-272 IJCET © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEME www.jifactor.com SIGNIFICANCE OF INTEGRATED TAXONOMY APPROACH IN DIVERSE LIVER CHAOSES A.S.Aneeshkumar1, Dr. C. Jothi Venkateswaran2 1 Research Scholar, PG & Research Dept. of Computer Science, Presidency College, Chennai, India, aneesh_kumar777@yahoo.com 2 Research Supervisor &Dean, Dept. of Computer Science & Applications, Presidency College, Chennai, India, jothivenkateswaran@yahoo.co.in ABSTRACT Liver chaos is an associated factor with other serious diseases like cancer, kidney failure and cardio vascular problems. The preliminary stage of liver dysfunction can be classified as three, which are Hepatitis, Alcoholic Fatty Liver Disease (AFLD) and Non- alcoholic Fatty Liver Disease (NAFLD). In this paper we are applying various integrated classification models of Data Mining for the evaluation of performance. Keywords: Associative Classification, Classification Based on Association, Classification based on Multiple Association Rules, Classification based on Predictive Association Rules, First Order Inductive Learner, Preprocessing. I. INTRODUCTION Many health care organizations having large number of health related data without proper analysis or storage. In a successful organization, it is import to empower the staff and management with data warehousing based on critical thinking and knowledge management tools for strategic decision making [1]. Medical diagnosis is regarded as an important yet complicated task that needs to be executed accurately and efficiently. Regrettably all doctors do not possess expertise in every subspecialty fields and moreover there is a shortage of resource persons in certain places. Therefore, an automatic medical diagnosis system would probably be exceedingly beneficial by bringing all of them together [2]. Determining the early stages of liver disease is better to avoid other consequences. The common liver related problems seen in Indian society are Hepatitis (Viral attack), Alcoholic Fatty Liver Disease (AFLD) and Non-alcoholic Fatty Liver Disease (NAFLD). Hepatitis can be broadly classified as HAV, HBV, HCV, HDV, and HEV. The first three are 265
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME very common in anywhere, but remaining two is seen very rarely and therefore such data are not available for this study. Normally, it is difficult to identify the exact reason of Liver dysfunction in some extend without further examinations and so in this work we are developing a model to predict the accurate type of disease without false assumptions. Human civilization since the time of immemorial to today, the alcohol is ubiquitous, with constantly changing patterns of alcohol intake around the world [3]. High level of continuous alcoholic consumption will lead to AFLD. Modern life style and lack of exercise are associated with an increased prevalence of diabetes mellitus (DM), Obesity, Hypertension, hyper triglycerides and which are calculated as major reasons of NAFLD. Hepatitis A may be spread primarily through food or water from an infected person and others are through sexual contact with an infected person, injection of drugs, infected women to infants and blood transfer. In some cases toxic drug effects will also be the factor of deviated results of the liver function tests and theses are seen in rare cases. Some hepatitis is considered to have autoimmune hepatitis, on the basis of combination of psychological and histological features, clinical and biochemical features as described by the International Autoimmune Hepatitis Group Report [4]. Radiologic imaging with Ultra-Sonography (US), Computed Tomography (CT) or Magnetic Resonance Imaging, used either singly or in combination, for detection of adequate threshold of fatty infiltrations in the liver. Then in the absence of obvious contraindications, a liver biopsy should be conducted in patients with unexplained sustained abnormalities of the liver biochemistry [5]. The greatest benefit of liver biopsy in unexplained abnormal LFT is to cover progressive liver disease either directly or through prompting a further diagnostic analysis. II. DATA DESCRIPTION The collected data for this study from a reputed hospital consists of all symptoms, doctor’s observation and liver biochemistry of the patients, where in addition to Liver Function Test (LFT), Triglycerides report also included for this study. Symptoms of Liver problems are uncertain in most cases. So we neglect such items and the following factors are the attributes for this work. TABLE 1. Biochemical Markers of LFT Lower Upper Test Items Limit Limit Bilirubin(D) 0.1 0.4 Bilirubin(T) 0.3 1.3 S.G.O.T 12 38 S.G.P.T. 7 41 Alkaline Phosphatase 80 290 Gamma GT 11/ 7 50/32 Total Proteins 6.7 8.6 Albumin 4.1 5.3 266
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME Globulins 2.0 3.5 TABLE 2. Other dependent factors HBAIC Yes No Obesity Yes No BP Yes No Triglycerides Yes No Alcoholic consumption Yes No III. METHODOLOGY The combination of knowledge discovery, information retrieval, deductive learning and exploratory data analysis can be called as Data mining [6]. Different algorithms are involved in it to extract models or patterns for particular prediction. The word Association in Mining is finding frequent item sets and discovery of association or correlation among such frequent items in a huge transaction. Another data analysis method called Classification is used to identify some important entities for prediction, where it extracts different models for better grouping of similar data types. In recent years the data mining researchers proposed a new integrated predictive approach for better accuracy is known as associative classification (AC), which combines associative mining and classification. According to most of the studies, we can see that, it has better accuracy and rule evaluation when compare to other classifications and rule based classification techniques. Because some rules used in AC will not appear in basic classification techniques. AC follows the method of classification in associative rule evaluation and the major application of this is in medical and engineering field. Medical practitioners are always following an inherent approach of combinational symptom and report based diagnosis for disease identification. Normally AC performs its analysis in relational database and each entity consists of attributes like ܽଵ , ܽଶ , ܽଷ , … … … ܽ௡ and one of them will consider as Class label or Predictive attribute [7]. Rule generation, rule pruning and classification are the three steps of AC. The first AC algorithm introduced as Classification Based on Association (CBA) and it is based on the Apriori algorithm, which gives Class Associated Rule (CAR). These rules will pruned and then select most appropriate rules from it, for further classification. The CBA will mines all CARs and produce a classification from generated CARs, then it calculate the accuracy of the Association Classifier [8]. Multiple scan features of Apriori produce large number of rules and which maintain large computational time. Classification based on Multiple Association Rules (CMAR) and Classifications based on Predictive Association Rules (CPAR) are introduced later as the extension of CBA. The CMAR implements FP-growth algorithm instead of Apriori for the generation of frequent item sets [9]. It first generate all the association rules with certain support and confidence thresholds and then selects a small set of rules from them to form a classifier. Then after, it identifies the class label for the best rule with best confidence. When compare 267
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME with previous classification it uses multiple rules in prediction, with the help of weighted ܺ ଶ and so CMAR provides more accuracy in classification. Classification based on Predictive Association Rules combines the advantages of both associative classification and traditional rule-based classification to attain maximum accuracy. It extends FP growth with interestingness and overlapping relationships in formation of rules. CPAR generates a smaller set of rules, with higher quality and lower redundancy in comparison with other associative classification techniques. So CPAR is much efficient in case of rule generation, prediction and accuracy. More over than that CPAR generates a small set of predictive rules directly from the dataset based on the rule prediction and coverage analysis with lesser time, which is opposite to generating candidate rules like in other Associative Classifications, where datasets contain a large number of rows and columns. So they needs more time to build the model. It follows dynamic programming pattern to avoid repeated calculation in rule generation and select the entire close to the best literals instead of selecting only the best literal only, so the important rules will not lost. IV. DATA PREPROCESSING To avoid certain inappropriate, duplicate, unrelated and missing fields it is necessary to implement preprocessing techniques. In our approach, we used dimensionality reduction to avoid patient’s psychodynamic factors and symptoms, because it may show a discrepancy according to the type and variation of Liver disorder. In our previous work, we used Principal Component Analysis (PCA) for this reduction. It searches for ݇ ݊-dimensional orthogonal vectors, which can bitterly represent the data, where ݇ ≤ ݊. the original data are thus projected onto a much smaller space, while it results dimensionality reduction. PCA always combines the essence of attributes by creating an alternative, smaller set of variables [10]. V. MODEL BUILDING In this paper, we build a model to evaluate the performances of four algorithms, where three are incorporated. Three separate and consolidate datasets are used here for this model construction. 5.1. FOIL First Order Inductive Learner is a sequential covering algorithm which learns first- order logic rules. The tuples of the class for learning rules are called positive tuples, where the remaining is called as negative tuples. The number of ‫ )݁ݒ݅ݐܽ݃݁݊(݁ݒ݅ݐ݅ݏ݋݌‬tuples covered by the rule ܴ1 is represented as ‫ )݃݁݊(ݏ݋݌‬and ‫ ݏ݋݌‬ூ (݊݁݃ ூ ) be the number of ‫ )݁ݒ݅ݐܽ݃݁݊(݁ݒ݅ݐ݅ݏ݋݌‬tuples covered by ܴ ூ . Basic idea of CPAR observed the concept of FOIL [11]. The information gain of rules which provide high accuracy and maximum positive tuples are ‫ ݏ݋݌‬ூ ‫ݏ݋݌‬ ‫ = ݊݅ܽܩ_ܮܫܱܨ‬ ூ + ݊݁݃ ூ − ݈‫ 2݃݋‬ ‫ݏ݋݌‬ ‫݃݁݊ + ݏ݋݌‬ 268
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME which build rules to distinguish positive tuples from negative tuples. In cases of multiple classes, FOIL is applied to each class, where if a rule generates, the positive sample it covers, will be removed until all positive tuples in the data set are covered. 5.2. CBA CBA uses a heuristic method to construct the classifier, unlike Apriori with the satisfaction of minimum support and minimum confidence level, where the rules are organized according to decreasing precedence based on their confidence and support [12]. If a set of rule with same antecedent, then the rule with the highest confidence is selected to represent the set. When classifying a new tuple, the first rule satisfying the tuple is used to classify it. The classifier also contains a default rule, having lowest precedence, which specifies a default class for any new tuple that is not satisfied by any other rule in the classifier [13]. The set of these rules make a decision set for this classifier. 5.3. CMAR CMAR uses an enhanced FP-tree that maintains the distribution of class labels among tuples, which satisfying each frequent itemset, and it combine rule generation together with frequent itemset mining in a single step. For an example ܴ1 and ܴ2 are two rules, where antecedent of ܴ1 is more general than that of ܴ2 and ܿ‫ ,)2ܴ(݂݊݋ܿ ≥ )1ܴ(݂݊݋‬then ܴ2 is pruned. That defines the rules with low confidence can be pruned is a more generalized version with higher confidence exists [14]. 5.4. CPAR CPAR relaxes the FOIL application of removing positive samples by allowing the covered tuples to remain there, but it reduces the weight. The resulting rule from each class in this manner will merge to form the classifier rule set. But it produces fewer rules than other CBA and CMAR. For an example, (1) (2) (3) (2) (4) (1) (2) (1) (3) is generated for the following four rules. ܴ1 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଶ = 2, ‫ܣ‬ଷ = 3) ܴ2 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଷ = 2, ‫ܣ‬ଶ = 4) ܴ3 = (‫ܣ‬ଵ = 1, ‫ܣ‬ଶ = 1, ‫ܣ‬ସ = 2, ‫ܣ‬ଷ = 2) ܴ4 = (‫ܣ‬ଵ = 1, ‫ܣ‬ସ = 1, ‫ܣ‬ଷ = 2, ‫ܣ‬ଶ = 1, ‫ܣ‬ହ = 3 ) 269
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME VI. ANALYSIS AND RESULTS As whispered earlier, because of ambiguity, we applied data reduction technique in preprocessing stage to reduce the dimensions. Then the attributes collected from the patients are shown in Table 1 and 2 for this work in addition to a class diagnosis. In Table 1, we can see the accuracy / Time of the classifier, where the full data set is used as a training set to build the model and again test it with the developed replica. First we applied these algorithms to identify viral infections of liver from the general category. In the second step we applied another set of data for classification, which related to the details of AFLD and the third dataset of NAFLD performs with lesser time. Finally we applied the combination of these three data sets to evaluate the performance. According to these results the average time taken by each method is same in Hepatitis data and the highest accuracy given by CPAR only. In case of AFLD data, four algorithms share a common time of 0.02 seconds with almost same and fine accuracy. The performance of CMAR is somewhat better here. In the next step also CPAR having better performance with 99.0 / 0.0 rate. I Finally in case of combined data, when compare to other three FOIL shows poor performance in both cases of accuracy and time, where CPAR completed within 0.02 seconds of time with good result. Figure 1 represents the performance of all these algorithms. TABLE 3. Evaluation of algorithms DATA TYPES FOIL CBA CMAR CPAR Hepatitis 88.26 / 0.04 91.70 / 0.05 91.98 / 0.04 92.0 / 0.04 AFLD 98.08 / 0.02 98.09 / 0.02 98.40 / 0.02 98.10 / 0.02 NAFLD 98.38 / 0.01 98.58 / 0.00 98.82 / 0.01 99.0 / 0.0 Consolidate 94.0 / 0.04 97.27 / 0.04 97.28 / 0.02 97.62 / 0.02 FOIL 100 98 96 94 92 90 88 86 84 CPAR 82 CBA CMAR Hepatitis infection ALD NALD Consolidate Fig 1. Rule quality performance 270
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME VII. CONCLUSION The observed factors used for this model are very important for liver disease prevalence. Most of the liver problems are presented because of the life style and behaviour of the peoples. In this analysis accuracy shows the quality of the rule. The integration of multiple rules gives more accurate results. As the result the overall performance of CPAR is better than other classifiers. So this approach can be adapted to other application field like medical, insurance, banking, recruitment and psychology for better prediction in the field of their expected results. REFERENCES Journal Papers: [1] Shantakumar B. Patil and Y.S.Kumaraswamy, Intelligent and Effective heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research, ISSN 1450-216X Vol.31 No.4(2009), pp.642-656. [2] P.Rajeswari and G.Sophia Reena, Analysis of Liver Disorder using Data mining Algorithm, Global Journal of Computer Science and Technology, Vol.10 Issue14 (Ver.10) November 2010, Page no.48. [3] Das SK, Balakrishnan V and Vasudevan DM, Alcohol: Its health and social impact in India, Natl Med J India 2006; 19:94-9. Report: [4] International Autoimmune Hepatitis Group Report, J Hepatol 1999: 31:929-938. Journal Papers: [5] Maeve M.Skelly, Peter D.James and Stephen D. Ryder, Findings on Liver biopsy to investgate abnormal liver function tests in the absence of diagnostic serology, Elsevier Journal of Hepatology 35(2001) 195-199. [6] Huda Yasin, Tahseen A. Jilani and Madiha Danish, Hepatitis-C Classification using Data mining Techniques, International Joural of Computer Applications (0957-8887), Volume 24-No.3, June 2011. [7] Sunita Soni and O.P.Vyas, Using Associative Classifiers for Predictive Analysis n Health Care Data Mining, International Journal of Computer Applications(0975- 8887), Volume 4- No.5, July 2010. [8] Asha .T, S.Natarajan and K.N.B. Murthy, A Study of Associative Classifiers with Different Rule Evaluation Measures for Tuberculosis Prediction, IJCA Special Issue on Artificial Intelligence Techniques – Novel Approaches & Practical Applications, AIT-2011. Proceedings Papers: [9] Xiaoxin Yin and Jiawei Han, CPAR: Classification based on Predictive Association Rules, Proceedings of the SIAM International Conference on Data Mining. San Francisco, CA: SIAM Press, 2003, pp:369-376. 271
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME Books: [10] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Published by Elsevier, second edition – 2006. Proceedings Papers: [11] J. R. Quinlan and R. M. Cameron-Jones, FOIL: A Midterm Report, Proceedings of European Conference in Machine Learning, Vienna, Austria, 1993. [12] Liu B., Hsu W., Ma Y., Integrating Classification and Association Rule mining, Proceedings of KDD, New York, pp.80-86, 1998. Journal Papers: [13] Z. Tang and Q. Liao, A new Class-based Associative Classification Algorithm, International Journal of Applied Mathematics, 2007 Proceedings Papers: [14] Li W., Han J., Pie J., Accurate and Efficient Classification Based on Multiple Class- Association Rules, proceedings of IEEE International Conference on Data Mining, Washington DC, USA, pp. 369-380, 2001. 272