SlideShare a Scribd company logo
1 of 6
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 6 (May. - Jun. 2013), PP 01-06
www.iosrjournals.org
www.iosrjournals.org 1 | Page
Performance Evaluation of Different Data Mining Classification
Algorithm and Predictive Analysis
Syeda Farha Shazmeen1
, Mirza Mustafa Ali Baig2
, M.Reena Pawar3
1, 2, 3
Department of Information Technology, Balaji Institute of Technology and Science, Warangal, A.P, India,
Abstract: Data mining is the knowledge discovery process by analyzing the large volumes of data from various
perspectives and summarizing it into useful information; data mining has become an essential component in
various fields of human life. It is used to identify hidden patterns in a large data set. Classification techniques
are supervised learning techniques that classify data item into predefined class label. It is one of the most useful
techniques in data mining to build classification models from an input data set; these techniques commonly
build models that are used to predict future data trends. In this paper we have worked with different data mining
applications and various classification algorithms, these algorithms have been applied on different dataset to
find out the efficiency of the algorithm and improve the performance by applying data preprocessing techniques
and feature selection and also prediction of new class labels.
Keywords: Classification, Mining Techniques, Algorithms.
I. Introduction
Data miming consists of a set of techniques that can be used to extract relevant and interesting
knowledge from data. Data mining has several tasks such as association rule mining, classification and
prediction, and clustering. Classification [1] is one of the most useful techniques in data mining to build
classification models from an input data set. The used classification techniques commonly build models that are
used to predict future data trends [2, 3].
Model construction: describing a set of predetermined classes
1. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute
2. The set of tuples used for model construction: training set.
3. The model is represented as classification rules, decision trees, or mathematical formulae.
4. Accuracy rate is the percentage of test set samples that are correctly classified by the model.
5. Test set is independent of training set, otherwise over-fitting will occur.
The goal of the predictive models is to construct a model by using the results of the known data and is to predict
the results of unknown data sets by using the constructed model [13].
In the classification and regression models the following techniques are mainly used
1. Decision Trees;
2. Artificial Neural Networks
3. Support Vector Machine
4. K-Nearest Neighbor
5. Navie-Bayes.
Before classification, data preprocessing techniques [4, 5] like data cleaning, data selection is applied
this techniques improve the efficiency of the algorithm to classify the data correctly.
II. Data Sets used in the application:
IRIS Datasets: -we make use of a large database „Fisher‟s Iris Dataset‟ containing 5 attributes and 150
instances. It consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris
versicolor). Four features were measured from each sample; they are the length and the width of sepal and
petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant
model to distinguish the species from each other classification method to identify the class of Iris flower as Iris-
setosa, Iris-versicolor or Iris-virginica using data mining classification algorithm.
Liver Disorder: -The observations in the dataset consist of 7 variables and 345 observed instances. The first 5
variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might
arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are
single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets,
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 2 | Page
indicating the class identity. Among all the observations, there are 145 people belonging to the liver-disorder
group (corresponding to selector number 2) and 200 people belonging to the liver-normal group.
E-coli: - (Escherichia coli commonly abbreviated E. coli) is a Gram-negative, rod-shaped bacterium that is
commonly found in the lower intestine of warm-blooded organisms (endotherms). This dataset is consisting of 8
attributes and 336 instances. The class distribution of 8 attributes are classified into 8 classes
cp (cytoplasm) 143
im (inner membrane without signal sequence) 77
pp (perisplasm) 52
imU (inner membrane, uncleavable signal sequence) 35
om (outer membrane) 20
omL (outer membrane lipoprotein) 5
imL (inner membrane lipoprotein) 2
imS (inner membrane, cleavable signal sequence) 2
III. Classification algorithm:
Differentiation classification algorithms [6, 7] have been used for the performance evaluation, below are listed.
1: -j48 (C4.5): J48 is an implementation of C4.5 [8] that builds decision trees from a set of training data in
the same way as ID3, using the concept of Information Entropy. The training data is a set S = s1, s2... of already
classified samples. Each sample si = x1, x2... is a vector where x1, x2… represent attributes or features of the
sample. Decision tree are efficient to use and display good accuracy for large amount of data. At each node of
the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched
in one class or the other.
2: -Naive Bayes: -a naive Bayes classifier assumes that the presence or absence of a particular feature is
unrelated to the presence or absence of any other feature, given the class variable. Bayesian belief networks are
graphical models, which unlikely naive Bayesian classifier; allow the representation of dependencies among
subsets of attributes[10]. Bayesian belief networks can also be used for classification. A simplified assumption:
attributes are conditionally independent:
Where V are the data samples, vi is the value of attribute i on the sample and Cjis the j-th class. Greatly reduces
the computation cost, only count the class distribution.
3: - k-nearest neighborhood:-
The k-NN algorithm for continuous-valued target functions Calculate the mean values of the k nearest
neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors
according to their distance to the query point x
q
g giving greater weight to closer neighbors Similarly, for real-
valued target functions. Robust to noisy data by averaging k-nearest neighbors.
The Euclidean distance between two points, X=(x1, x2,...................xn) and Y=(y1, y2,...................yn) is


n
i
jijj CvCC PPVP
1
)|()()|(
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 3 | Page
4: -Neural Network:-
Neural networks have emerged as an important tool for classification. The recent vast research
activities in neural classification have established that neural networks are a promising alternative to various
conventional classification methods. The advantage of neural networks lies in the following theoretical aspects.
First, neural networks [9] are data driven self-adaptive methods in that they can adjust themselves to the data
without any explicit specification of functional or distributional form for the underlying model.
5: -Support Vector Machine: -
A new classification method for both linear and non linear data.It uses a nonlinear mapping to
transform the original training data into a higher dimension. With the new dimension, it searches for the linear
optimal separating hyper plane (i.e., “decision boundary”). With an appropriate nonlinear mapping to a
sufficiently high dimension, data from two classes can always be separated by a hyper plane SVM finds this
hyper plane using support vectors (“essential” training tuples) and margins (defined by the support vectors).
IV. Implementation
The Following tools and technologies used for the Experimentation is Rapid Miner [14] is an open
source-learning environment for data mining and machine learning. This environment can be used to extract
meaning from a dataset. There are hundreds of machine learning operators to choose from, helpful pre and post
processing operators, descriptive graphic visualizations, and many other features. It is available as a stand-alone
application for data analysis and as a data-mining engine for the integration into own products.
Weka[15] is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-
processing, classification, regression, clustering, association rules, and visualization. It is also well suited for
developing new machine learning schemes. Weka provides access to SQL databases using Java Database
Connectivity and can process the result returned by a database query.
JFreeChart [16] is an open-source framework for the programming language Java; it is an open source library
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 4 | Page
available for Java that allows users to easily generate graphs and charts. It is particularly effective for when a
user needs to regenerate graphs that change on a frequent basis. JFreeChart supports pie charts (2D and 3D), bar
charts, line charts, scatter plots, time series charts, and high-low-open- close charts.
V. Experiment and Results:
The different algorithms are applied to different dataset and the performance is evaluated [11, 12]. The
information gain measure is used to select the test attribute at each node in the tree. Such a measure is referred
to as an attribute selection measure or a measure of the goodness of split. The attribute with the highest
information gain is chosen as the test attribute for the current node. This attribute minimizes the information
needed to classify the samples in the resulting partitions and reflects the least randomness or “impurity” in these
partitions.
Information gain
If a set S of records is partitioned into classes C1, C2, C3. . . Cion the basis of the categorical attribute,
then the information needed to identify the class of an element of S is denoted by:
...........................
Where pis the probability distribution of the partition Ci.Thus Entropy can be written like this
Thus the information gain in performing a branching with attribute A can be calculated with this equation.
Gain (A) =I (S)-E (A)
After the preprocessing of data, it comes computing the information gain necessary to construct the decision
tree. After these information gains are computed, the decision tree is constructed.
Gain (sepal length) =0.44
Gain (sepal width) =0.0
Gain (petal length) =1
Gain (petal width) =1
Fig: 1: -J-48 Decision tree for iris dataset
The feature selection can apply to remove the zero weight attributes; by removing those attributes the
performance can be improved. The most useful part of this is attribute selection.
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 5 | Page
1. Select relevant attributes
2. Remove redundant and/or irrelevant attributes.
The screen shots and results are displayed below.
Datasets Classification Performance
Iris
J48 95.33%
Neural net 97.33%
E-coil
Decision tree 80.06%
Neural net 83.01%
Liver Disorder
Neural net 67.59%
J48 69.58%
Table1: Classification results by using different Algorithms
Fig: 2: -Classification result Pie chart for liver Fig: 3:-Improved classification result for liver disorder
Disorder dataset. Using neural net with removal of zero weight
Fig: 4: -Classification result for e-coli dataset
using neural net Fig: 5: -pie chart of e-coil dataset
Fig: 6: -Classification result for iris dataset using neural net Fig: 7: -Pie chart for iris dataset
Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis
www.iosrjournals.org 6 | Page
Predictive analysis can be used to predict the new class label. By using the different algorithms but Neural
Network and K-Nearest Neighborhood was the efficient.
Datasets Classification Performance
Of Prediction
Iris
Naive Bayes Incorrect
Neural net Correct
E-coil
SVM Incorrect
KNN Correct
Liver Disorder
Decision tree Incorrect
KNN Correct
Table2. Displays the Predictions of new class label
VI. Conclusion:
Classification is one of the most useful techniques in data mining to build classification models from an
input data set. Choosing the right data mining technique is another important fact to overcome the problems.
The results of different classification algorithm are compared and to find which algorithm generates the
effective results and graphically displays the results. Compare the performance of the different classification
algorithms when applied on different datasets.
References:
[1]. SERHAT ÖZEKES and A.YILMAZ ÇAMURCU:” CLASSIFICATION AND PREDICTION IN A DATA MINING
APPLICATION “Journal of Marmara for Pure and Applied Sciences, 18 (2002) 159-174 Marmara University, Printed in Turkey.
[2]. Kaushik H and Raviya Biren Gajjar ,”Performance Evaluation of Different Data Mining Classification Algorithm Using
WEKA”,Indian Journal of Research(PARIPEX) Volume : 2 | Issue : 1 | January 2013 ISSN - 2250-1991.
[3]. WaiHoAu,KeithC.C.Chan;XinYao.ANovelEvolutionaryDataMiningAlgorithmwithApplicationstoChurn Prediction. IEEE
TRANSACTIONS ON EVOLUTIONARY COMPUTATION, Vol. 7, No. 6, Dec 2003, PP: 532- 545
[4]. Reza Allahyari Soeini and Keyvan Vahidy Rodpysh: “Evaluations of Data Mining Methods in Order to Provide the Optimum
Method for Customer Churn Prediction: Case Study Insurance Industry.”2012 International Conference on Information and
Computer Applications (ICICA 2012)IPCSI vol. 24 (2012) © (2012) IACSIT Press, Singapore.
[5]. Surjeet Kumar Yadav and Saurabh Pal:” Data Mining: A Prediction for Performance Improvement of Engineering Students using
Classification”World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741
 Vol. 2, No. 2, 51-56,
2012.
[6]. Zhong, N.; Zhou, L.: “Methodologies for Knowledge Discovery and Data Mining”, The Third Pacific-Asia Conference, Pakdd-99,
Beijing, China, April 26-28, 1999; Proceedings, Springer Verlag, (1999).
[7]. Raj Kumar, Dr. Rajesh Verma:” Classification Algorithms for Data Mining: A Survey”International Journal of Innovations in
Engineering and Technology (IJIET)
[8]. Fayyad, U.: “Mining Databases: Towards Algorithms for Knowledge Discovery”, IEEE Bulletin of the Technical Committee on Data
Engineering, 21 (1) (1998) 41-48.
[9]. Ling Liu:” From Data Privacy to Location Privacy: Models and Algorithms”September 23-28, 2007, Vienna, Austria.
[10]. Qi Li and Donald W. Tufts:,”Principal Feature Classification ” IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO.
1, JANUARY 1997.
[11]. Deen, Ahmad , classification based on association-rule mining techniques: a general survey and empirical comparative evaluation ,
Ubiquitous Computing and Communication Journal vol 5 no 3
[12]. Sanjay D. Sawaitul, Prof. K. P. Wagh, Dr. P. N. Chatur,” Classification and Prediction of Future Weather by using Back Propagation
Algorithm-An Approach”,International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com
(ISSN 2250-2459, Volume 2, Issue 1, January 2012)
[13]. Qasem A. Al-Radaideh & Eman Al Nagi,” Using Data Mining Techniques to Build a Classification Model for Predicting Employees
Performance”,(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012.
[14]. RapidMiner is an open source-learning environment for data mining and machine learning. http://rapid-i.com/content/view/181/190/
[15]. Weka is a collection of machine learning algorithms for data mining tasks http://www.cs.waikato.ac.nz/ml/weka/downloading.html
[16] JFreeChart, JFreeChart is an open source library available for Java that allows users to easily generate graphs and charts-
http://www.jfree.org/index.html

More Related Content

What's hot

What's hot (20)

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
 
Three Address code
Three Address code Three Address code
Three Address code
 
Backtracking
BacktrackingBacktracking
Backtracking
 
Density based methods
Density based methodsDensity based methods
Density based methods
 
Buffering.pptx
Buffering.pptxBuffering.pptx
Buffering.pptx
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Computer networks lan
Computer networks lanComputer networks lan
Computer networks lan
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Cache optimization
Cache optimizationCache optimization
Cache optimization
 
Congestion avoidance in TCP
Congestion avoidance in TCPCongestion avoidance in TCP
Congestion avoidance in TCP
 
Congestion control
Congestion controlCongestion control
Congestion control
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
Multiple Access Protocal
Multiple Access ProtocalMultiple Access Protocal
Multiple Access Protocal
 
Topic : X.25, Frame relay and ATM
Topic :  X.25, Frame relay and ATMTopic :  X.25, Frame relay and ATM
Topic : X.25, Frame relay and ATM
 
Compiler construction tools
Compiler construction toolsCompiler construction tools
Compiler construction tools
 
DeadLock in Operating-Systems
DeadLock in Operating-SystemsDeadLock in Operating-Systems
DeadLock in Operating-Systems
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Code generation
Code generationCode generation
Code generation
 

Viewers also liked

Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIOSR Journals
 
Implementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisImplementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisIOSR Journals
 
A Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemA Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemIOSR Journals
 
College Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementCollege Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementIOSR Journals
 
Relational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresRelational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresIOSR Journals
 
Smart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemSmart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemIOSR Journals
 
Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...IOSR Journals
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerIOSR Journals
 
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionDiurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionIOSR Journals
 
Concepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesConcepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesIOSR Journals
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.IOSR Journals
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...IOSR Journals
 
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...IOSR Journals
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 

Viewers also liked (20)

Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image Segmentation
 
Implementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance AnalysisImplementation of Cellular IP and Its Performance Analysis
Implementation of Cellular IP and Its Performance Analysis
 
A Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection SystemA Study on Recent Trends and Developments in Intrusion Detection System
A Study on Recent Trends and Developments in Intrusion Detection System
 
College Collaboration Portal with Training and Placement
College Collaboration Portal with Training and PlacementCollege Collaboration Portal with Training and Placement
College Collaboration Portal with Training and Placement
 
Relational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality AssuresRelational Analysis of Software Developer’s Quality Assures
Relational Analysis of Software Developer’s Quality Assures
 
Smart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating SystemSmart Way to Track the Location in Android Operating System
Smart Way to Track the Location in Android Operating System
 
Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...Adaptability of Distribution Automation System to Electric Power Quality Moni...
Adaptability of Distribution Automation System to Electric Power Quality Moni...
 
L0126598103
L0126598103L0126598103
L0126598103
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud Server
 
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical RegionDiurnal Effects on Satellite Network Performance Measured In Tropical Region
Diurnal Effects on Satellite Network Performance Measured In Tropical Region
 
A0420104
A0420104A0420104
A0420104
 
Concepts and Derivatives of Web Services
Concepts and Derivatives of Web ServicesConcepts and Derivatives of Web Services
Concepts and Derivatives of Web Services
 
A0420104
A0420104A0420104
A0420104
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
 
M017147275
M017147275M017147275
M017147275
 
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
A Proportional-Integral-Derivative Control Scheme of Mobile Robotic platforms...
 
B012660611
B012660611B012660611
B012660611
 
C017161925
C017161925C017161925
C017161925
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 

Similar to Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis

E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...IRJET Journal
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Networkirjes
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IRJET Journal
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Drjabez
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network  An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network ijsc
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKijsc
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniqueseSAT Journals
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 

Similar to Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis (20)

E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Network
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
IDENTIFICATION OF DIFFERENT SPECIES OF IRIS FLOWER USING MACHINE LEARNING ALG...
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools Anomaly detection by using CFS subset and neural network with WEKA tools
Anomaly detection by using CFS subset and neural network with WEKA tools
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network  An Approach for IRIS Plant Classification Using Neural Network
An Approach for IRIS Plant Classification Using Neural Network
 
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORKAN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
AN APPROACH FOR IRIS PLANT CLASSIFICATION USING NEURAL NETWORK
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Recently uploaded (20)

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 6 (May. - Jun. 2013), PP 01-06 www.iosrjournals.org www.iosrjournals.org 1 | Page Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis Syeda Farha Shazmeen1 , Mirza Mustafa Ali Baig2 , M.Reena Pawar3 1, 2, 3 Department of Information Technology, Balaji Institute of Technology and Science, Warangal, A.P, India, Abstract: Data mining is the knowledge discovery process by analyzing the large volumes of data from various perspectives and summarizing it into useful information; data mining has become an essential component in various fields of human life. It is used to identify hidden patterns in a large data set. Classification techniques are supervised learning techniques that classify data item into predefined class label. It is one of the most useful techniques in data mining to build classification models from an input data set; these techniques commonly build models that are used to predict future data trends. In this paper we have worked with different data mining applications and various classification algorithms, these algorithms have been applied on different dataset to find out the efficiency of the algorithm and improve the performance by applying data preprocessing techniques and feature selection and also prediction of new class labels. Keywords: Classification, Mining Techniques, Algorithms. I. Introduction Data miming consists of a set of techniques that can be used to extract relevant and interesting knowledge from data. Data mining has several tasks such as association rule mining, classification and prediction, and clustering. Classification [1] is one of the most useful techniques in data mining to build classification models from an input data set. The used classification techniques commonly build models that are used to predict future data trends [2, 3]. Model construction: describing a set of predetermined classes 1. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute 2. The set of tuples used for model construction: training set. 3. The model is represented as classification rules, decision trees, or mathematical formulae. 4. Accuracy rate is the percentage of test set samples that are correctly classified by the model. 5. Test set is independent of training set, otherwise over-fitting will occur. The goal of the predictive models is to construct a model by using the results of the known data and is to predict the results of unknown data sets by using the constructed model [13]. In the classification and regression models the following techniques are mainly used 1. Decision Trees; 2. Artificial Neural Networks 3. Support Vector Machine 4. K-Nearest Neighbor 5. Navie-Bayes. Before classification, data preprocessing techniques [4, 5] like data cleaning, data selection is applied this techniques improve the efficiency of the algorithm to classify the data correctly. II. Data Sets used in the application: IRIS Datasets: -we make use of a large database „Fisher‟s Iris Dataset‟ containing 5 attributes and 150 instances. It consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample; they are the length and the width of sepal and petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other classification method to identify the class of Iris flower as Iris- setosa, Iris-versicolor or Iris-virginica using data mining classification algorithm. Liver Disorder: -The observations in the dataset consist of 7 variables and 345 observed instances. The first 5 variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets,
  • 2. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 2 | Page indicating the class identity. Among all the observations, there are 145 people belonging to the liver-disorder group (corresponding to selector number 2) and 200 people belonging to the liver-normal group. E-coli: - (Escherichia coli commonly abbreviated E. coli) is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms (endotherms). This dataset is consisting of 8 attributes and 336 instances. The class distribution of 8 attributes are classified into 8 classes cp (cytoplasm) 143 im (inner membrane without signal sequence) 77 pp (perisplasm) 52 imU (inner membrane, uncleavable signal sequence) 35 om (outer membrane) 20 omL (outer membrane lipoprotein) 5 imL (inner membrane lipoprotein) 2 imS (inner membrane, cleavable signal sequence) 2 III. Classification algorithm: Differentiation classification algorithms [6, 7] have been used for the performance evaluation, below are listed. 1: -j48 (C4.5): J48 is an implementation of C4.5 [8] that builds decision trees from a set of training data in the same way as ID3, using the concept of Information Entropy. The training data is a set S = s1, s2... of already classified samples. Each sample si = x1, x2... is a vector where x1, x2… represent attributes or features of the sample. Decision tree are efficient to use and display good accuracy for large amount of data. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. 2: -Naive Bayes: -a naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. Bayesian belief networks are graphical models, which unlikely naive Bayesian classifier; allow the representation of dependencies among subsets of attributes[10]. Bayesian belief networks can also be used for classification. A simplified assumption: attributes are conditionally independent: Where V are the data samples, vi is the value of attribute i on the sample and Cjis the j-th class. Greatly reduces the computation cost, only count the class distribution. 3: - k-nearest neighborhood:- The k-NN algorithm for continuous-valued target functions Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point x q g giving greater weight to closer neighbors Similarly, for real- valued target functions. Robust to noisy data by averaging k-nearest neighbors. The Euclidean distance between two points, X=(x1, x2,...................xn) and Y=(y1, y2,...................yn) is   n i jijj CvCC PPVP 1 )|()()|(
  • 3. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 3 | Page 4: -Neural Network:- Neural networks have emerged as an important tool for classification. The recent vast research activities in neural classification have established that neural networks are a promising alternative to various conventional classification methods. The advantage of neural networks lies in the following theoretical aspects. First, neural networks [9] are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model. 5: -Support Vector Machine: - A new classification method for both linear and non linear data.It uses a nonlinear mapping to transform the original training data into a higher dimension. With the new dimension, it searches for the linear optimal separating hyper plane (i.e., “decision boundary”). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyper plane SVM finds this hyper plane using support vectors (“essential” training tuples) and margins (defined by the support vectors). IV. Implementation The Following tools and technologies used for the Experimentation is Rapid Miner [14] is an open source-learning environment for data mining and machine learning. This environment can be used to extract meaning from a dataset. There are hundreds of machine learning operators to choose from, helpful pre and post processing operators, descriptive graphic visualizations, and many other features. It is available as a stand-alone application for data analysis and as a data-mining engine for the integration into own products. Weka[15] is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre- processing, classification, regression, clustering, association rules, and visualization. It is also well suited for developing new machine learning schemes. Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. JFreeChart [16] is an open-source framework for the programming language Java; it is an open source library
  • 4. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 4 | Page available for Java that allows users to easily generate graphs and charts. It is particularly effective for when a user needs to regenerate graphs that change on a frequent basis. JFreeChart supports pie charts (2D and 3D), bar charts, line charts, scatter plots, time series charts, and high-low-open- close charts. V. Experiment and Results: The different algorithms are applied to different dataset and the performance is evaluated [11, 12]. The information gain measure is used to select the test attribute at each node in the tree. Such a measure is referred to as an attribute selection measure or a measure of the goodness of split. The attribute with the highest information gain is chosen as the test attribute for the current node. This attribute minimizes the information needed to classify the samples in the resulting partitions and reflects the least randomness or “impurity” in these partitions. Information gain If a set S of records is partitioned into classes C1, C2, C3. . . Cion the basis of the categorical attribute, then the information needed to identify the class of an element of S is denoted by: ........................... Where pis the probability distribution of the partition Ci.Thus Entropy can be written like this Thus the information gain in performing a branching with attribute A can be calculated with this equation. Gain (A) =I (S)-E (A) After the preprocessing of data, it comes computing the information gain necessary to construct the decision tree. After these information gains are computed, the decision tree is constructed. Gain (sepal length) =0.44 Gain (sepal width) =0.0 Gain (petal length) =1 Gain (petal width) =1 Fig: 1: -J-48 Decision tree for iris dataset The feature selection can apply to remove the zero weight attributes; by removing those attributes the performance can be improved. The most useful part of this is attribute selection.
  • 5. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 5 | Page 1. Select relevant attributes 2. Remove redundant and/or irrelevant attributes. The screen shots and results are displayed below. Datasets Classification Performance Iris J48 95.33% Neural net 97.33% E-coil Decision tree 80.06% Neural net 83.01% Liver Disorder Neural net 67.59% J48 69.58% Table1: Classification results by using different Algorithms Fig: 2: -Classification result Pie chart for liver Fig: 3:-Improved classification result for liver disorder Disorder dataset. Using neural net with removal of zero weight Fig: 4: -Classification result for e-coli dataset using neural net Fig: 5: -pie chart of e-coil dataset Fig: 6: -Classification result for iris dataset using neural net Fig: 7: -Pie chart for iris dataset
  • 6. Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis www.iosrjournals.org 6 | Page Predictive analysis can be used to predict the new class label. By using the different algorithms but Neural Network and K-Nearest Neighborhood was the efficient. Datasets Classification Performance Of Prediction Iris Naive Bayes Incorrect Neural net Correct E-coil SVM Incorrect KNN Correct Liver Disorder Decision tree Incorrect KNN Correct Table2. Displays the Predictions of new class label VI. Conclusion: Classification is one of the most useful techniques in data mining to build classification models from an input data set. Choosing the right data mining technique is another important fact to overcome the problems. The results of different classification algorithm are compared and to find which algorithm generates the effective results and graphically displays the results. Compare the performance of the different classification algorithms when applied on different datasets. References: [1]. SERHAT ÖZEKES and A.YILMAZ ÇAMURCU:” CLASSIFICATION AND PREDICTION IN A DATA MINING APPLICATION “Journal of Marmara for Pure and Applied Sciences, 18 (2002) 159-174 Marmara University, Printed in Turkey. [2]. Kaushik H and Raviya Biren Gajjar ,”Performance Evaluation of Different Data Mining Classification Algorithm Using WEKA”,Indian Journal of Research(PARIPEX) Volume : 2 | Issue : 1 | January 2013 ISSN - 2250-1991. [3]. WaiHoAu,KeithC.C.Chan;XinYao.ANovelEvolutionaryDataMiningAlgorithmwithApplicationstoChurn Prediction. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, Vol. 7, No. 6, Dec 2003, PP: 532- 545 [4]. Reza Allahyari Soeini and Keyvan Vahidy Rodpysh: “Evaluations of Data Mining Methods in Order to Provide the Optimum Method for Customer Churn Prediction: Case Study Insurance Industry.”2012 International Conference on Information and Computer Applications (ICICA 2012)IPCSI vol. 24 (2012) © (2012) IACSIT Press, Singapore. [5]. Surjeet Kumar Yadav and Saurabh Pal:” Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification”World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741
 Vol. 2, No. 2, 51-56, 2012. [6]. Zhong, N.; Zhou, L.: “Methodologies for Knowledge Discovery and Data Mining”, The Third Pacific-Asia Conference, Pakdd-99, Beijing, China, April 26-28, 1999; Proceedings, Springer Verlag, (1999). [7]. Raj Kumar, Dr. Rajesh Verma:” Classification Algorithms for Data Mining: A Survey”International Journal of Innovations in Engineering and Technology (IJIET) [8]. Fayyad, U.: “Mining Databases: Towards Algorithms for Knowledge Discovery”, IEEE Bulletin of the Technical Committee on Data Engineering, 21 (1) (1998) 41-48. [9]. Ling Liu:” From Data Privacy to Location Privacy: Models and Algorithms”September 23-28, 2007, Vienna, Austria. [10]. Qi Li and Donald W. Tufts:,”Principal Feature Classification ” IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 1, JANUARY 1997. [11]. Deen, Ahmad , classification based on association-rule mining techniques: a general survey and empirical comparative evaluation , Ubiquitous Computing and Communication Journal vol 5 no 3 [12]. Sanjay D. Sawaitul, Prof. K. P. Wagh, Dr. P. N. Chatur,” Classification and Prediction of Future Weather by using Back Propagation Algorithm-An Approach”,International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 1, January 2012) [13]. Qasem A. Al-Radaideh & Eman Al Nagi,” Using Data Mining Techniques to Build a Classification Model for Predicting Employees Performance”,(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012. [14]. RapidMiner is an open source-learning environment for data mining and machine learning. http://rapid-i.com/content/view/181/190/ [15]. Weka is a collection of machine learning algorithms for data mining tasks http://www.cs.waikato.ac.nz/ml/weka/downloading.html [16] JFreeChart, JFreeChart is an open source library available for Java that allows users to easily generate graphs and charts- http://www.jfree.org/index.html