SlideShare a Scribd company logo
1 of 6
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 6 (May. - Jun. 2013), PP 01-06
www.iosrjournals.org
www.iosrjournals.org 1 | Page
Performance Evaluation of Different Data Mining Classification
Algorithm and Predictive Analysis
Syeda Farha Shazmeen1
, Mirza Mustafa Ali Baig2
, M.Reena Pawar3
1, 2, 3
Department of Information Technology, Balaji Institute of Technology and Science, Warangal, A.P, India,
Abstract: Data mining is the knowledge discovery process by analyzing the large volumes of data from various
perspectives and summarizing it into useful information; data mining has become an essential component in
various fields of human life. It is used to identify hidden patterns in a large data set. Classification techniques
are supervised learning techniques that classify data item into predefined class label. It is one of the most useful
techniques in data mining to build classification models from an input data set; these techniques commonly
build models that are used to predict future data trends. In this paper we have worked with different data mining
applications and various classification algorithms, these algorithms have been applied on different dataset to
find out the efficiency of the algorithm and improve the performance by applying data preprocessing techniques
and feature selection and also prediction of new class labels.
Keywords: Classification, Mining Techniques, Algorithms.
I. Introduction
Data miming consists of a set of techniques that can be used to extract relevant and interesting
knowledge from data. Data mining has several tasks such as association rule mining, classification and
prediction, and clustering. Classification [1] is one of the most useful techniques in data mining to build
classification models from an input data set. The used classification techniques commonly build models that are
used to predict future data trends [2, 3].
Model construction: describing a set of predetermined classes
1. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute
2. The set of tuples used for model construction: training set.
3. The model is represented as classification rules, decision trees, or mathematical formulae.
4. Accuracy rate is the percentage of test set samples that are correctly classified by the model.
5. Test set is independent of training set, otherwise over-fitting will occur.
The goal of the predictive models is to construct a model by using the results of the known data and is to predict
the results of unknown data sets by using the constructed model [13].
In the classification and regression models the following techniques are mainly used
1. Decision Trees;
2. Artificial Neural Networks
3. Support Vector Machine
4. K-Nearest Neighbor
5. Navie-Bayes.
Before classification, data preprocessing techniques [4, 5] like data cleaning, data selection is applied
this techniques improve the efficiency of the algorithm to classify the data correctly.
II. Data Sets used in the application:
IRIS Datasets: -we make use of a large database „Fisher‟s Iris Dataset‟ containing 5 attributes and 150
instances. It consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris
versicolor). Four features were measured from each sample; they are the length and the width of sepal and
petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant
model to distinguish the species from each other classification method to identify the class of Iris flower as Iris-
setosa, Iris-versicolor or Iris-virginica using data mining classification algorithm.
Liver Disorder: -The observations in the dataset consist of 7 variables and 345 observed instances. The first 5
variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might
arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are
single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets,
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 2 | Page
indicating the class identity. Among all the observations, there are 145 people belonging to the liver-disorder
group (corresponding to selector number 2) and 200 people belonging to the liver-normal group.
E-coli: - (Escherichia coli commonly abbreviated E. coli) is a Gram-negative, rod-shaped bacterium that is
commonly found in the lower intestine of warm-blooded organisms (endotherms). This dataset is consisting of 8
attributes and 336 instances. The class distribution of 8 attributes are classified into 8 classes
cp (cytoplasm) 143
im (inner membrane without signal sequence) 77
pp (perisplasm) 52
imU (inner membrane, uncleavable signal sequence) 35
om (outer membrane) 20
omL (outer membrane lipoprotein) 5
imL (inner membrane lipoprotein) 2
imS (inner membrane, cleavable signal sequence) 2
III. Classification algorithm:
Differentiation classification algorithms [6, 7] have been used for the performance evaluation, below are listed.
1: -j48 (C4.5): J48 is an implementation of C4.5 [8] that builds decision trees from a set of training data in
the same way as ID3, using the concept of Information Entropy. The training data is a set S = s1, s2... of already
classified samples. Each sample si = x1, x2... is a vector where x1, x2… represent attributes or features of the
sample. Decision tree are efficient to use and display good accuracy for large amount of data. At each node of
the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched
in one class or the other.
2: -Naive Bayes: -a naive Bayes classifier assumes that the presence or absence of a particular feature is
unrelated to the presence or absence of any other feature, given the class variable. Bayesian belief networks are
graphical models, which unlikely naive Bayesian classifier; allow the representation of dependencies among
subsets of attributes[10]. Bayesian belief networks can also be used for classification. A simplified assumption:
attributes are conditionally independent:
Where V are the data samples, vi is the value of attribute i on the sample and Cjis the j-th class. Greatly reduces
the computation cost, only count the class distribution.
3: - k-nearest neighborhood:-
The k-NN algorithm for continuous-valued target functions Calculate the mean values of the k nearest
neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors
according to their distance to the query point x
q
g giving greater weight to closer neighbors Similarly, for real-
valued target functions. Robust to noisy data by averaging k-nearest neighbors.
The Euclidean distance between two points, X=(x1, x2,...................xn) and Y=(y1, y2,...................yn) is


n
i
jijj CvCC PPVP
1
)|()()|(
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 3 | Page
4: -Neural Network:-
Neural networks have emerged as an important tool for classification. The recent vast research
activities in neural classification have established that neural networks are a promising alternative to various
conventional classification methods. The advantage of neural networks lies in the following theoretical aspects.
First, neural networks [9] are data driven self-adaptive methods in that they can adjust themselves to the data
without any explicit specification of functional or distributional form for the underlying model.
5: -Support Vector Machine: -
A new classification method for both linear and non linear data.It uses a nonlinear mapping to
transform the original training data into a higher dimension. With the new dimension, it searches for the linear
optimal separating hyper plane (i.e., “decision boundary”). With an appropriate nonlinear mapping to a
sufficiently high dimension, data from two classes can always be separated by a hyper plane SVM finds this
hyper plane using support vectors (“essential” training tuples) and margins (defined by the support vectors).
IV. Implementation
The Following tools and technologies used for the Experimentation is Rapid Miner [14] is an open
source-learning environment for data mining and machine learning. This environment can be used to extract
meaning from a dataset. There are hundreds of machine learning operators to choose from, helpful pre and post
processing operators, descriptive graphic visualizations, and many other features. It is available as a stand-alone
application for data analysis and as a data-mining engine for the integration into own products.
Weka[15] is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-
processing, classification, regression, clustering, association rules, and visualization. It is also well suited for
developing new machine learning schemes. Weka provides access to SQL databases using Java Database
Connectivity and can process the result returned by a database query.
JFreeChart [16] is an open-source framework for the programming language Java; it is an open source library
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 4 | Page
available for Java that allows users to easily generate graphs and charts. It is particularly effective for when a
user needs to regenerate graphs that change on a frequent basis. JFreeChart supports pie charts (2D and 3D), bar
charts, line charts, scatter plots, time series charts, and high-low-open- close charts.
V. Experiment and Results:
The different algorithms are applied to different dataset and the performance is evaluated [11, 12]. The
information gain measure is used to select the test attribute at each node in the tree. Such a measure is referred
to as an attribute selection measure or a measure of the goodness of split. The attribute with the highest
information gain is chosen as the test attribute for the current node. This attribute minimizes the information
needed to classify the samples in the resulting partitions and reflects the least randomness or “impurity” in these
partitions.
Information gain
If a set S of records is partitioned into classes C1, C2, C3. . . Cion the basis of the categorical attribute,
then the information needed to identify the class of an element of S is denoted by:
...........................
Where pis the probability distribution of the partition Ci.Thus Entropy can be written like this
Thus the information gain in performing a branching with attribute A can be calculated with this equation.
Gain (A) =I (S)-E (A)
After the preprocessing of data, it comes computing the information gain necessary to construct the decision
tree. After these information gains are computed, the decision tree is constructed.
Gain (sepal length) =0.44
Gain (sepal width) =0.0
Gain (petal length) =1
Gain (petal width) =1
Fig: 1: -J-48 Decision tree for iris dataset
The feature selection can apply to remove the zero weight attributes; by removing those attributes the
performance can be improved. The most useful part of this is attribute selection.
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 5 | Page
1. Select relevant attributes
2. Remove redundant and/or irrelevant attributes.
The screen shots and results are displayed below.
Datasets Classification Performance
Iris
J48 95.33%
Neural net 97.33%
E-coil
Decision tree 80.06%
Neural net 83.01%
Liver Disorder
Neural net 67.59%
J48 69.58%
Table1: Classification results by using different Algorithms
Fig: 2: -Classification result Pie chart for liver Fig: 3:-Improved classification result for liver disorder
Disorder dataset. Using neural net with removal of zero weight
Fig: 4: -Classification result for e-coli dataset
using neural net Fig: 5: -pie chart of e-coil dataset
Fig: 6: -Classification result for iris dataset using neural net Fig: 7: -Pie chart for iris dataset
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 6 | Page
Predictive analysis can be used to predict the new class label. By using the different algorithms but Neural
Network and K-Nearest Neighborhood was the efficient.
Datasets Classification Performance
Of Prediction
Iris
Naive Bayes Incorrect
Neural net Correct
E-coil
SVM Incorrect
KNN Correct
Liver Disorder
Decision tree Incorrect
KNN Correct
Table2. Displays the Predictions of new class label
VI. Conclusion:
Classification is one of the most useful techniques in data mining to build classification models from an
input data set. Choosing the right data mining technique is another important fact to overcome the problems.
The results of different classification algorithm are compared and to find which algorithm generates the
effective results and graphically displays the results. Compare the performance of the different classification
algorithms when applied on different datasets.
References:
[1]. SERHAT ÖZEKES and A.YILMAZ ÇAMURCU:” CLASSIFICATION AND PREDICTION IN A DATA MINING
APPLICATION “Journal of Marmara for Pure and Applied Sciences, 18 (2002) 159-174 Marmara University, Printed in Turkey.
[2]. Kaushik H and Raviya Biren Gajjar ,”Performance Evaluation of Different Data Mining Classification Algorithm Using
WEKA”,Indian Journal of Research(PARIPEX) Volume : 2 | Issue : 1 | January 2013 ISSN - 2250-1991.
[3]. WaiHoAu,KeithC.C.Chan;XinYao.ANovelEvolutionaryDataMiningAlgorithmwithApplicationstoChurn Prediction. IEEE
TRANSACTIONS ON EVOLUTIONARY COMPUTATION, Vol. 7, No. 6, Dec 2003, PP: 532- 545
[4]. Reza Allahyari Soeini and Keyvan Vahidy Rodpysh: “Evaluations of Data Mining Methods in Order to Provide the Optimum
Method for Customer Churn Prediction: Case Study Insurance Industry.”2012 International Conference on Information and
Computer Applications (ICICA 2012)IPCSI vol. 24 (2012) © (2012) IACSIT Press, Singapore.
[5]. Surjeet Kumar Yadav and Saurabh Pal:” Data Mining: A Prediction for Performance Improvement of Engineering Students using
Classification”World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741
 Vol. 2, No. 2, 51-56,
2012.
[6]. Zhong, N.; Zhou, L.: “Methodologies for Knowledge Discovery and Data Mining”, The Third Pacific-Asia Conference, Pakdd-99,
Beijing, China, April 26-28, 1999; Proceedings, Springer Verlag, (1999).
[7]. Raj Kumar, Dr. Rajesh Verma:” Classification Algorithms for Data Mining: A Survey”International Journal of Innovations in
Engineering and Technology (IJIET)
[8]. Fayyad, U.: “Mining Databases: Towards Algorithms for Knowledge Discovery”, IEEE Bulletin of the Technical Committee on Data
Engineering, 21 (1) (1998) 41-48.
[9]. Ling Liu:” From Data Privacy to Location Privacy: Models and Algorithms”September 23-28, 2007, Vienna, Austria.
[10]. Qi Li and Donald W. Tufts:,”Principal Feature Classification ” IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO.
1, JANUARY 1997.
[11]. Deen, Ahmad , classification based on association-rule mining techniques: a general survey and empirical comparative evaluation ,
Ubiquitous Computing and Communication Journal vol 5 no 3
[12]. Sanjay D. Sawaitul, Prof. K. P. Wagh, Dr. P. N. Chatur,” Classification and Prediction of Future Weather by using Back Propagation
Algorithm-An Approach”,International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com
(ISSN 2250-2459, Volume 2, Issue 1, January 2012)
[13]. Qasem A. Al-Radaideh & Eman Al Nagi,” Using Data Mining Techniques to Build a Classification Model for Predicting Employees
Performance”,(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012.
[14]. RapidMiner is an open source-learning environment for data mining and machine learning. http://rapid-i.com/content/view/181/190/
[15]. Weka is a collection of machine learning algorithms for data mining tasks http://www.cs.waikato.ac.nz/ml/weka/downloading.html
[16] JFreeChart, JFreeChart is an open source library available for Java that allows users to easily generate graphs and charts-
http://www.jfree.org/index.html

More Related Content

What's hot

Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management SystemJanki Shah
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocksJothi Lakshmi
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rulesswapnac12
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logicAmey Kerkar
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]ssuser23e4f31
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMvikas dhakane
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schemaChaand Chopra
 

What's hot (20)

Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocks
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
mano.ppt
mano.pptmano.ppt
mano.ppt
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Unit iv(simple code generator)
Unit iv(simple code generator)Unit iv(simple code generator)
Unit iv(simple code generator)
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schema
 
Text MIning
Text MIningText MIning
Text MIning
 

Viewers also liked

Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIOSR Journals
 
Implementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisImplementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisIOSR Journals
 
A Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemA Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemIOSR Journals
 
College Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementCollege Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementIOSR Journals
 
Relational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresRelational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresIOSR Journals
 
Smart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemSmart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemIOSR Journals
 
Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...IOSR Journals
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerIOSR Journals
 
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionDiurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionIOSR Journals
 
Concepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesConcepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesIOSR Journals
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.IOSR Journals
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...IOSR Journals
 
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...IOSR Journals
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 

Viewers also liked (20)

Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image Segmentation
 
Implementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisImplementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance Analysis
 
A Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemA Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection System
 
College Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementCollege Collaboration Portal with Training and Placement
College Collaboration Portal with Training and Placement
 
Relational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresRelational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality Assures
 
Smart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemSmart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating System
 
Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...
 
L0126598103
L0126598103L0126598103
L0126598103
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud Server
 
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionDiurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
 
A0420104
A0420104A0420104
A0420104
 
Concepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesConcepts and Derivatives of Web Services
Concepts and Derivatives of Web Services
 
A0420104
A0420104A0420104
A0420104
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
 
M017147275
M017147275M017147275
M017147275
 
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
 
B012660611
B012660611B012660611
B012660611
 
C017161925
C017161925C017161925
C017161925
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 

Similar to Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis

E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...IRJET Journal
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Networkirjes
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IRJET Journal
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Drjabez
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKijsc
 
An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network  An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network ijsc
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniqueseSAT Journals
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 

Similar to Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis (20)

E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Network
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
 
An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network  An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 

Recently uploaded (20)

Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 

Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 6 (May. - Jun. 2013), PP 01-06 www.iosrjournals.org www.iosrjournals.org 1 | Page Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis Syeda Farha Shazmeen1 , Mirza Mustafa Ali Baig2 , M.Reena Pawar3 1, 2, 3 Department of Information Technology, Balaji Institute of Technology and Science, Warangal, A.P, India, Abstract: Data mining is the knowledge discovery process by analyzing the large volumes of data from various perspectives and summarizing it into useful information; data mining has become an essential component in various fields of human life. It is used to identify hidden patterns in a large data set. Classification techniques are supervised learning techniques that classify data item into predefined class label. It is one of the most useful techniques in data mining to build classification models from an input data set; these techniques commonly build models that are used to predict future data trends. In this paper we have worked with different data mining applications and various classification algorithms, these algorithms have been applied on different dataset to find out the efficiency of the algorithm and improve the performance by applying data preprocessing techniques and feature selection and also prediction of new class labels. Keywords: Classification, Mining Techniques, Algorithms. I. Introduction Data miming consists of a set of techniques that can be used to extract relevant and interesting knowledge from data. Data mining has several tasks such as association rule mining, classification and prediction, and clustering. Classification [1] is one of the most useful techniques in data mining to build classification models from an input data set. The used classification techniques commonly build models that are used to predict future data trends [2, 3]. Model construction: describing a set of predetermined classes 1. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute 2. The set of tuples used for model construction: training set. 3. The model is represented as classification rules, decision trees, or mathematical formulae. 4. Accuracy rate is the percentage of test set samples that are correctly classified by the model. 5. Test set is independent of training set, otherwise over-fitting will occur. The goal of the predictive models is to construct a model by using the results of the known data and is to predict the results of unknown data sets by using the constructed model [13]. In the classification and regression models the following techniques are mainly used 1. Decision Trees; 2. Artificial Neural Networks 3. Support Vector Machine 4. K-Nearest Neighbor 5. Navie-Bayes. Before classification, data preprocessing techniques [4, 5] like data cleaning, data selection is applied this techniques improve the efficiency of the algorithm to classify the data correctly. II. Data Sets used in the application: IRIS Datasets: -we make use of a large database „Fisher‟s Iris Dataset‟ containing 5 attributes and 150 instances. It consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample; they are the length and the width of sepal and petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other classification method to identify the class of Iris flower as Iris- setosa, Iris-versicolor or Iris-virginica using data mining classification algorithm. Liver Disorder: -The observations in the dataset consist of 7 variables and 345 observed instances. The first 5 variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets,
  • 2. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 2 | Page indicating the class identity. Among all the observations, there are 145 people belonging to the liver-disorder group (corresponding to selector number 2) and 200 people belonging to the liver-normal group. E-coli: - (Escherichia coli commonly abbreviated E. coli) is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms (endotherms). This dataset is consisting of 8 attributes and 336 instances. The class distribution of 8 attributes are classified into 8 classes cp (cytoplasm) 143 im (inner membrane without signal sequence) 77 pp (perisplasm) 52 imU (inner membrane, uncleavable signal sequence) 35 om (outer membrane) 20 omL (outer membrane lipoprotein) 5 imL (inner membrane lipoprotein) 2 imS (inner membrane, cleavable signal sequence) 2 III. Classification algorithm: Differentiation classification algorithms [6, 7] have been used for the performance evaluation, below are listed. 1: -j48 (C4.5): J48 is an implementation of C4.5 [8] that builds decision trees from a set of training data in the same way as ID3, using the concept of Information Entropy. The training data is a set S = s1, s2... of already classified samples. Each sample si = x1, x2... is a vector where x1, x2… represent attributes or features of the sample. Decision tree are efficient to use and display good accuracy for large amount of data. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. 2: -Naive Bayes: -a naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. Bayesian belief networks are graphical models, which unlikely naive Bayesian classifier; allow the representation of dependencies among subsets of attributes[10]. Bayesian belief networks can also be used for classification. A simplified assumption: attributes are conditionally independent: Where V are the data samples, vi is the value of attribute i on the sample and Cjis the j-th class. Greatly reduces the computation cost, only count the class distribution. 3: - k-nearest neighborhood:- The k-NN algorithm for continuous-valued target functions Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point x q g giving greater weight to closer neighbors Similarly, for real- valued target functions. Robust to noisy data by averaging k-nearest neighbors. The Euclidean distance between two points, X=(x1, x2,...................xn) and Y=(y1, y2,...................yn) is   n i jijj CvCC PPVP 1 )|()()|(
  • 3. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 3 | Page 4: -Neural Network:- Neural networks have emerged as an important tool for classification. The recent vast research activities in neural classification have established that neural networks are a promising alternative to various conventional classification methods. The advantage of neural networks lies in the following theoretical aspects. First, neural networks [9] are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model. 5: -Support Vector Machine: - A new classification method for both linear and non linear data.It uses a nonlinear mapping to transform the original training data into a higher dimension. With the new dimension, it searches for the linear optimal separating hyper plane (i.e., “decision boundary”). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyper plane SVM finds this hyper plane using support vectors (“essential” training tuples) and margins (defined by the support vectors). IV. Implementation The Following tools and technologies used for the Experimentation is Rapid Miner [14] is an open source-learning environment for data mining and machine learning. This environment can be used to extract meaning from a dataset. There are hundreds of machine learning operators to choose from, helpful pre and post processing operators, descriptive graphic visualizations, and many other features. It is available as a stand-alone application for data analysis and as a data-mining engine for the integration into own products. Weka[15] is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization. It is also well suited for developing new machine learning schemes. Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. JFreeChart [16] is an open-source framework for the programming language Java; it is an open source library
  • 4. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 4 | Page available for Java that allows users to easily generate graphs and charts. It is particularly effective for when a user needs to regenerate graphs that change on a frequent basis. JFreeChart supports pie charts (2D and 3D), bar charts, line charts, scatter plots, time series charts, and high-low-open- close charts. V. Experiment and Results: The different algorithms are applied to different dataset and the performance is evaluated [11, 12]. The information gain measure is used to select the test attribute at each node in the tree. Such a measure is referred to as an attribute selection measure or a measure of the goodness of split. The attribute with the highest information gain is chosen as the test attribute for the current node. This attribute minimizes the information needed to classify the samples in the resulting partitions and reflects the least randomness or “impurity” in these partitions. Information gain If a set S of records is partitioned into classes C1, C2, C3. . . Cion the basis of the categorical attribute, then the information needed to identify the class of an element of S is denoted by: ........................... Where pis the probability distribution of the partition Ci.Thus Entropy can be written like this Thus the information gain in performing a branching with attribute A can be calculated with this equation. Gain (A) =I (S)-E (A) After the preprocessing of data, it comes computing the information gain necessary to construct the decision tree. After these information gains are computed, the decision tree is constructed. Gain (sepal length) =0.44 Gain (sepal width) =0.0 Gain (petal length) =1 Gain (petal width) =1 Fig: 1: -J-48 Decision tree for iris dataset The feature selection can apply to remove the zero weight attributes; by removing those attributes the performance can be improved. The most useful part of this is attribute selection.
  • 5. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 5 | Page 1. Select relevant attributes 2. Remove redundant and/or irrelevant attributes. The screen shots and results are displayed below. Datasets Classification Performance Iris J48 95.33% Neural net 97.33% E-coil Decision tree 80.06% Neural net 83.01% Liver Disorder Neural net 67.59% J48 69.58% Table1: Classification results by using different Algorithms Fig: 2: -Classification result Pie chart for liver Fig: 3:-Improved classification result for liver disorder Disorder dataset. Using neural net with removal of zero weight Fig: 4: -Classification result for e-coli dataset using neural net Fig: 5: -pie chart of e-coil dataset Fig: 6: -Classification result for iris dataset using neural net Fig: 7: -Pie chart for iris dataset
  • 6. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 6 | Page Predictive analysis can be used to predict the new class label. By using the different algorithms but Neural Network and K-Nearest Neighborhood was the efficient. Datasets Classification Performance Of Prediction Iris Naive Bayes Incorrect Neural net Correct E-coil SVM Incorrect KNN Correct Liver Disorder Decision tree Incorrect KNN Correct Table2. Displays the Predictions of new class label VI. Conclusion: Classification is one of the most useful techniques in data mining to build classification models from an input data set. Choosing the right data mining technique is another important fact to overcome the problems. The results of different classification algorithm are compared and to find which algorithm generates the effective results and graphically displays the results. Compare the performance of the different classification algorithms when applied on different datasets. References: [1]. SERHAT ÖZEKES and A.YILMAZ ÇAMURCU:” CLASSIFICATION AND PREDICTION IN A DATA MINING APPLICATION “Journal of Marmara for Pure and Applied Sciences, 18 (2002) 159-174 Marmara University, Printed in Turkey. [2]. Kaushik H and Raviya Biren Gajjar ,”Performance Evaluation of Different Data Mining Classification Algorithm Using WEKA”,Indian Journal of Research(PARIPEX) Volume : 2 | Issue : 1 | January 2013 ISSN - 2250-1991. [3]. WaiHoAu,KeithC.C.Chan;XinYao.ANovelEvolutionaryDataMiningAlgorithmwithApplicationstoChurn Prediction. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, Vol. 7, No. 6, Dec 2003, PP: 532- 545 [4]. Reza Allahyari Soeini and Keyvan Vahidy Rodpysh: “Evaluations of Data Mining Methods in Order to Provide the Optimum Method for Customer Churn Prediction: Case Study Insurance Industry.”2012 International Conference on Information and Computer Applications (ICICA 2012)IPCSI vol. 24 (2012) © (2012) IACSIT Press, Singapore. [5]. Surjeet Kumar Yadav and Saurabh Pal:” Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification”World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741
 Vol. 2, No. 2, 51-56, 2012. [6]. Zhong, N.; Zhou, L.: “Methodologies for Knowledge Discovery and Data Mining”, The Third Pacific-Asia Conference, Pakdd-99, Beijing, China, April 26-28, 1999; Proceedings, Springer Verlag, (1999). [7]. Raj Kumar, Dr. Rajesh Verma:” Classification Algorithms for Data Mining: A Survey”International Journal of Innovations in Engineering and Technology (IJIET) [8]. Fayyad, U.: “Mining Databases: Towards Algorithms for Knowledge Discovery”, IEEE Bulletin of the Technical Committee on Data Engineering, 21 (1) (1998) 41-48. [9]. Ling Liu:” From Data Privacy to Location Privacy: Models and Algorithms”September 23-28, 2007, Vienna, Austria. [10]. Qi Li and Donald W. Tufts:,”Principal Feature Classification ” IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 1, JANUARY 1997. [11]. Deen, Ahmad , classification based on association-rule mining techniques: a general survey and empirical comparative evaluation , Ubiquitous Computing and Communication Journal vol 5 no 3 [12]. Sanjay D. Sawaitul, Prof. K. P. Wagh, Dr. P. N. Chatur,” Classification and Prediction of Future Weather by using Back Propagation Algorithm-An Approach”,International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 1, January 2012) [13]. Qasem A. Al-Radaideh & Eman Al Nagi,” Using Data Mining Techniques to Build a Classification Model for Predicting Employees Performance”,(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012. [14]. RapidMiner is an open source-learning environment for data mining and machine learning. http://rapid-i.com/content/view/181/190/ [15]. Weka is a collection of machine learning algorithms for data mining tasks http://www.cs.waikato.ac.nz/ml/weka/downloading.html [16] JFreeChart, JFreeChart is an open source library available for Java that allows users to easily generate graphs and charts- http://www.jfree.org/index.html