SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
DOI:10.5121/mlaij.2016.3303 25
MACHINE LEARNING TOOLBOX
Dr. Pankaj Agarwal Professor, Akshay Bhasin, Rohit Keshwani, Aman Verma and
Suryansh Kaushik
B.Tech CS 4th Yr Department of Computer Science, IMS Engineering College,
Ghaziabad, U.P, India
ABSTRACT
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as NaĂŻve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis.
KEYWORDS
Machine Learning, Neural Networks, Backpropagation, Decision Tree (C4.5), SVM (Support Vector
Machines), K-means Clustering, K-nearest Neighbors.
1. INTRODUCTION
Machine Learning is simply the ability of a machine to learn from some set of data and make
inferences on the basis of that data. As explained by Tom Mitchell [1] machine learning can be
best defined as:
“A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance measure at tasks in T, as measure by P, improves
with experience E.”
The beauty of machine learning models is that when they are exposed to new data, they are able
to independently adapt. The need for fast and quick way to try different datasets on different
algorithms with different parameters is still seen in the market. The use of this kind of tool can be
important to researchers, students, teachers or professionals. The goal of the machine learning
toolbox is to provide an easy way to upload datasets in text file format and apply various
machine learning algorithms. The machine learning toolbox will provide options such as
choosing both training and testing dataset to be the same or splitting the training and testing
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
26
datasets in some proportions. The toolbox will provide suitable options in different algorithms
such as providing a field for knowing the number of hidden neurons and a field for knowing the
learning rate in case of neural network. This way the user can interact with the toolbox easily, try
out different patterns, and see which configuration of an algorithms suits the dataset well.
Machine Learning Toolbox is a research project that is basically meant to provide the user with a
complete package where he/she can use it to know how different machine learning approaches
and algorithms would work on a particular dataset. The toolbox would allow the user to provide
a dataset either in structured or unstructured form and one can choose the features he/she wants
to use for training the machine. In this manner, the user will have a complete control in the
comparison process.
2. SUPERVISED MACHINE LEARNING APPROACHES
Machine Learning ToolBox includes the usage of various supervised learning techniques. The
search for algorithms that can reason from externally supplied instances to product a general
hypothesis is known as supervised machine, which then make predictions about future instances.
In other words, the main aim of supervised learning is to build a model of the distribution of
class labels by using predictor features. The resulting classifier model is then used to assign class
labels to the testing datasets where the values of the predicting data are known, but the value of
the class label is unknown. In the present paper, we have focused on the algorithms necessary to
do this. In particular, this work is concerned with classification problems in which the output of
instances acknowledges only continuous, real values [2]. Table 1 gives an example of iris dataset
format with four features expressed in real values & three possible output classes expressed
numerically.
Table 1 Example of Iris Data Set for applying supervised learning algorithm
Feature 1 Feature 2 Feature 3 Feature 4
Known output
class
sepal length
in cm
sepal width in
cm
petal length
in cm
petal width in
cm
Iris Setosa /
Iris
Versicolour/
Iris Virginica
Inductive machine learning is the process of creating a classifier model that can later be used to
predict for new datasets. The process of applying supervised machine learning to a real-world
problem is described in Figure 1.
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
27
Figure 1 Supervised Learning Flow Diagram
2.1 NaĂŻve Bayes Classifier
Naive Bayes techniques are a set of supervised learning algorithms based on Bayes’ theorem
with the assumption of independence between every pair of features. Naive Bayes is a simple
method for constructing classifiers: the models assign class labels to problem instances,
represented as vectors of feature values, where the output class labels predicted from some finite
set[3]. Naive Bayes requires a small amount of training data to estimate the parameters necessary
for classification
Given a class variable and a dependent feature vector through , Bayes’ theorem
states the following relationship:
The different naive Bayes classifiers differ mainly by the assumptions they make regarding the
distribution of . Naive Bayes learners and classifiers can be extremely fast compared
to more sophisticated methods.
2.2 K-nearest Neighbors Classifier
KNN is a very popular classification algorithm that produces good performance characteristics
and takes lesser time for training the datasets. k-Nearest Neighbors algorithm (or k-NN for short)
is a non-parametric method used for classification and regression. k-NN is a type of instance-
based learning, or lazy learning, where the function is only approximated locally and all
computation is deferred until classification. The k-NN algorithm is among the simplest of all
machine-learning algorithms.
However, there are a few disadvantages of KNN algorithm. First, all the neighbors are treated
equally in the standard KNN algorithm. Second, the imbalanced data problem can occur in KNN
algorithms. Large classes always have a better chance to win [4].
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
28
Step 1) Search the K training instances that are closest to unknown instance.
Step 2) Pick the most commonly occurring class label for these K instances.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test
point) is classified by assigning the label which is most frequent among the k training samples
nearest to that query point. A commonly used distance metric for continuous variables is
Euclidean distance. For discrete variables, such as for text classification, another metric can be
used, such as the overlap metric (or Hamming distance)
2.3 Decision Tree Classifier
Decision Tree Classifier is a simple and widely used classification technique. It applies a simple
approach to solve the classification problem. Each time it receives an answer, a follow-up
question is asked until a conclusion about the class label of the record is reached. A decision tree
is a tree like structure where in decisions are taken based on values of some attributes and thus
further branching is done. Decision tree can classify a data to some class label by moving from
the root node and comparing the attribute values as specified in the nodes until a leaf node is
reached. The leaf node contains the class label. It aims is the partition of a dataset into groups as
homogeneous as possible in terms of the variable to be predicted [5].
Creating a optimal decision tree is a key problem in decision tree classifier. In general, for many
cases a decision tree can be constructed from a given set of attributes. While in some cases,
finding the optimal tree is computationally infeasible because of the exponential size of the
search space.
However, various efficient algorithms have been developed to construct a reasonably accurate,
suboptimal decision tree in a reasonable amount of time. These algorithms usually employ a
greedy strategy that grows a decision tree by making a series of locally optimum decisions about
which attribute to use for partitioning the data. For example, Hunt's algorithm, ID3, C4.5, CART,
SPRINT are greedy decision tree induction algorithms.
2.4 SVM Classifier
The support vector machine (SVM) is a training algorithm for classification rule from the data
set which trains the classifier; it is then used to predict the class of the new sample. SVM is
based on the concept of decision planes that define decision boundary and point to form the
decision boundary between the classes called support vector threat as parameter [6]. An SVM
model is a representation of the examples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as wide as possible. New examples are then
mapped into that same space and predicted to belong to a category based on which side of the
gap they fall on. More formally, a support vector machine constructs a hyperplane or set of
hyperplanes in a high- or infinite-dimensional space, which can be used for classification,
regression, or other tasks as shown below.
.
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
29
Figure 2 SVM Illustration
2.5 Artificial Neural Network Classifier
Artificial Neural Network is a network where nodes are interconnected and they represent some
value and are connected to each other via some weighted edges. This network is parallel in the
sense that the neurons are collected at different layers and adjacent layer neurons are connected.
ANN was made to mimic the working of a biological nervous system where each neuron
contributes some amount to the final output.
Neural networks are made up of layers, which are in turn made up of a number of interconnected
nodes which contain an activation function. The nodes from input layer are connected to the
nodes of the hidden layer with weighted edges. The hidden layers are linked to an output layer
where the answer is received. It is represented as:
Figure 3 Neural Network Illustration
These networks have been established to be universal approximates of continuous/discontinuous
functions and therefore they are suitable for usage where we have some information about the
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
30
input-output map to be approximated. A set of data (Input-Output information) is used for
training the network. Any input can be given once the network has been trained and it will give
an output, which would correspond to the expected output from the approximated mapping.
The activation function used is the log-sigmoid function as given in can be expressed as: -
Where
w array contains the weights of the edges and x array contains the outputs of the previous layer.
For the hidden layer x correspond to the input of the network while for the output layer x
correspond to the output of the hidden layer. The network is trained using the error back
propagation algorithm. The weight update rule as given in can be expressed as: -
where α is called the momentum constant, η is the learning rate, ∆wji(n) is the correction applied
to the synaptic weight connecting the output of neuron i to the input of neuron j at iteration n, ÎŽj
(n) is the local gradient at nth iteration, yi (n) is the function signal appearing at the output of
neuron i at iteration n. The percentage error is specified beforehand according to a problem and
thus the network tries to converge to that error by using back propagation technique. The
advantage of using neural network approach is that they are robust with noisy data, adaptable,
nonparametric, robust with respect to node failure and empirically shown to work well for many
problem domains.
3. PROPOSED MACHINE LEARNINGTOOLBOX
The machine learning toolbox is developed using python as the core programming language and
the main aim was to provide a code free environment to a user so that he/she can easily try out
different algorithms with different parameters and thus can choose the algorithm that fits the
datasets best.
It takes in the dataset in arff format. An ARFF (Attribute-Relation File Format) file is an ASCII
text file that describes a list of instances sharing a set of attributes. Two datasets that were used
here for comparative analysis are Iris dataset and a Pima Indian Diabetes dataset.
One of them is Iris dataset or Fisher's Iris data set. It is a data set produced by Ronald Fisher that
he used in his paper “The use of multiple measurements in taxonomic problems” as an example
of linear discriminant analysis. There are 50 samples from each of three species of Iris (Iris
setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
31
length and the width of the sepals and petals, in centimeters. Based on the combination of these
four features, the task is to distinguish the species from each other. [8]
The other dataset is Pima Indian Diabetes dataset, a determination of whether a given woman is
diabetic[9]. There are 7 predictors, including personal history and medical measurements and of
the women were Pima Indians.
The toolbox provides algorithms like NaĂŻve Bayes, SVM, Neural Network, Decision Tree and K-
nearest neighbors. It provides an easy to use GUI, which takes in input basic parameters for each
algorithm. For example, in neural network, there are field to enter learning rate, momentum,
number of hidden neurons and the maximum epochs up to which it should be trained.
4. RESULTS
Accuracy of Results
NaĂŻve Bayes Algorithm
NaĂŻve Bayes was run on iris dataset having 150 training values and was tested on the same 150
test cases. The accuracy was seen to be 96%. Naive Bayes was also run on diabetes dataset
having about 750 training values and was tested 150 new values and accuracy between was 70-
76% was observed
K-nearest Neighbors Algorithm
K-nearest neighbors was run on iris dataset and diabetes datasets and showed the following
results for various training and testing datasets partition:
Table 2 Accuracy results for iris dataset on K-nearest neighbors
K
Split(TRAIN:TEST)
100:100 80:20 70:30
1 100% 96.66% 97.77%
3 96% 96.66% 95.55%
Table 3 Accuracy results for diabetes dataset on knn
K
Split(TRAIN:TEST)
100:100 80:20 70:30
1 100% 65.58% 66.23%
3 85.41% 70.12% 70.56%
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
32
SVM Algorithm
SVM classifier was run on iris dataset and diabetes datasets and showed the following results for
various training and testing datasets partition:
Table 4 Accuracy results for iris and diabetes dataset on SVM
Dataset
Split(TRAIN:TEST)
100:100 80:20 70:30
Iris 82% 93.33% 95.55%
Diabetes 74.60% 67.53% 62.77%
Decision Tree Algorithm
Decision Tree classifier was run on iris dataset and diabetes datasets and showed the following
results for different training and testing datasets partition:
Table 5 Accuracy results for iris and diabetes dataset on Decision Tree
Dataset
Split(TRAIN:TEST)
100:100 80:20 70:30
Iris 100% 70.97% 73.91%
Diabetes 100% 53.55% 53.45%
Neural Network Backpropagation Algorithm
Neural Net classifier was run on iris dataset and diabetes datasets and showed the following
results for various training and testing datasets partition:
Table 6 Accuracy results for iris and diabetes dataset on Neural Network
Dataset
Hidden
Layers
Split(TRAIN:TEST)
100:100 80:20 70:30
Iris
4 76% 75% 70.47%
5 97.33% 75% 92.35%
Diabetes
4 65.10% 65.31% 64.05%
5 65.10% 65.53% 65.41%
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
33
Summarized Result
Running different classifiers on different datasets with different parameters show a lot of
variation in the accuracy results on iris dataset. This can be summarized as:
Table 7 Comparison of results on iris and diabetes datasets on different algorithms
Algorithm
Average
Accuracy
Remarks
NaĂŻve Bayes 88%
The fastest of all algorithms with a higher accuracy
rate.
K-nearest
Neighbors
96% This algorithm worked best on iris dataset.
SVM 84%
Slower than most algorithms and low accuracy due to
random choice of attributes.
Decision Tree 80% Moderate time and accuracy results.
Neural
Network
78%
Slowest of all algorithms and works well with low
number of hidden neurons.
5. CONCLUSIONS AND FUTURE SCOPE
We can easily apply implemented ML algorithms on any data set with set of numerical features
and setting the appropriate parameters through our GUI. This can be surely a useful tool for
researchers/students who wish to apply a ML algorithm on his/her dataset and understand which
algorithm works better.
We are currently working on adding many more ML algorithms and adding graphical
interpretations to compare the effectiveness of algorithms. It is expected that a fully developed
toolbox available on web will also do commercially well. At least we were unable to find similar
kind of toolbox on web. We will also be including unsupervised methods in the upcoming
versions.
REFERENCES
1. T. M. Mitchell, “Machine Learning,” McGraw Hill, 1997.
2. Ch. M. Bishop, “Pattern Recognition and Machine Learning,” Springer, 2006.
3. P. Langley, W. Iba., and K. Thompson. Decision making using probabilistic inference methods. In
Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI’92), pages
399–406, San Mateo, CA, 1992.
4. Nitin Bhatia, vandana” Survey on nearest neighbor techniques” IJCSIS, Vol 80, no 2(2010).
5. HSSINA, Badr et al. "A Comparative Study Of Decision Tree ID3 And C4.5". International
Journal of Advanced Computer Science and Applications 4.2 (2014): n. pag. Web.
6. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Support Vector Machines. In An
Introduction to Statistical Learning (pp. 337-372). Springer New York.
7. S.B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques,
Informatica 31(2007) 249-268, 2007.
Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
34
8. Wikipedia (2012). Iris flower data set. Retrieved 18 April, 2016,
https://en.wikipedia.org/wiki/Iris_flower_data_set.
9. https://archive.ics.uci.edu/ml/datasets.html
ACKNOWLEDGEMENT
Dr. Pankaj Agarwal is current associate as Professor & Head, Department of
Computer Science & Engineering, IMS Engineering College,Ghaziabad, U.P,
India. He has more than 15 years of teaching & more than 8 years of research
experience. Dr. Agarwal has published research publications in more than 30
International Journals & Conferences of repute. He has also authored five books
on subjects including Algorithm Analysis, Software Project Management, .NET,
Cyber Laws, Computer Programming. His current areas of interest includes
softcomputing, Machine learning & data analytics. I would like to appreciate my
team of students including Akshay, Rohit, Aman & Suryansh, studying in B.Tech
4th
year who took up this project and completed the assigned tasks as per the
desired expectations. We plan to add many new features to this toolbox.

Weitere Àhnliche Inhalte

Was ist angesagt?

Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3eSAT Journals
 
Automated News Categorization Using Machine Learning Techniques
Automated News Categorization Using Machine Learning TechniquesAutomated News Categorization Using Machine Learning Techniques
Automated News Categorization Using Machine Learning TechniquesDrjabez
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides vieweSAT Publishing House
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...IJECEIAES
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelDr. Abdul Ahad Abro
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining TechniquesValerii Klymchuk
 

Was ist angesagt? (17)

Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3
 
Automated News Categorization Using Machine Learning Techniques
Automated News Categorization Using Machine Learning TechniquesAutomated News Categorization Using Machine Learning Techniques
Automated News Categorization Using Machine Learning Techniques
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Research scholars evaluation based on guides view
Research scholars evaluation based on guides viewResearch scholars evaluation based on guides view
Research scholars evaluation based on guides view
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
M43016571
M43016571M43016571
M43016571
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Classification
ClassificationClassification
Classification
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
my IEEE
my IEEEmy IEEE
my IEEE
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 

Andere mochten auch

CURRICULUM VITAE(ajay)(1)
CURRICULUM VITAE(ajay)(1)CURRICULUM VITAE(ajay)(1)
CURRICULUM VITAE(ajay)(1)Ajay juttu
 
Thirst for Water
Thirst for WaterThirst for Water
Thirst for Waterkelcombe
 
19 Productivity Hacks You Can Start Using Today
19 Productivity Hacks You Can Start Using Today19 Productivity Hacks You Can Start Using Today
19 Productivity Hacks You Can Start Using Todaydsmith76
 
Biosmart eng
Biosmart engBiosmart eng
Biosmart engArtemaB
 
Robin C Calimag Resume 2015
Robin C  Calimag Resume 2015Robin C  Calimag Resume 2015
Robin C Calimag Resume 2015Robin Calimag
 
Internetit_GSM_draft1
Internetit_GSM_draft1Internetit_GSM_draft1
Internetit_GSM_draft1Louise Isackson
 
1 aedl - iywinc lite block presentation march 2015 eng
1 aedl - iywinc lite block presentation march 2015 eng1 aedl - iywinc lite block presentation march 2015 eng
1 aedl - iywinc lite block presentation march 2015 engMichael Walsh
 
Evaluation part 7
Evaluation part 7Evaluation part 7
Evaluation part 7nathanth
 
4 lite block presentation
4 lite block presentation 4 lite block presentation
4 lite block presentation Michael Walsh
 
Mahpara
MahparaMahpara
Mahparamahajat
 
Kekontiniuan
KekontiniuanKekontiniuan
KekontiniuanKesini Dong
 
Security Tools
Security ToolsSecurity Tools
Security Toolssmyth337
 

Andere mochten auch (20)

Ai
AiAi
Ai
 
CURRICULUM VITAE(ajay)(1)
CURRICULUM VITAE(ajay)(1)CURRICULUM VITAE(ajay)(1)
CURRICULUM VITAE(ajay)(1)
 
Thirst for Water
Thirst for WaterThirst for Water
Thirst for Water
 
19 Productivity Hacks You Can Start Using Today
19 Productivity Hacks You Can Start Using Today19 Productivity Hacks You Can Start Using Today
19 Productivity Hacks You Can Start Using Today
 
Adithya Resume
Adithya ResumeAdithya Resume
Adithya Resume
 
Biosmart eng
Biosmart engBiosmart eng
Biosmart eng
 
AI
AIAI
AI
 
Jestem rowerzystą
Jestem rowerzystąJestem rowerzystą
Jestem rowerzystą
 
Robin C Calimag Resume 2015
Robin C  Calimag Resume 2015Robin C  Calimag Resume 2015
Robin C Calimag Resume 2015
 
Ni luh dewi pradnyawati
Ni luh dewi pradnyawatiNi luh dewi pradnyawati
Ni luh dewi pradnyawati
 
Internetit_GSM_draft1
Internetit_GSM_draft1Internetit_GSM_draft1
Internetit_GSM_draft1
 
RM SEM1
RM SEM1RM SEM1
RM SEM1
 
1 aedl - iywinc lite block presentation march 2015 eng
1 aedl - iywinc lite block presentation march 2015 eng1 aedl - iywinc lite block presentation march 2015 eng
1 aedl - iywinc lite block presentation march 2015 eng
 
Evaluation part 7
Evaluation part 7Evaluation part 7
Evaluation part 7
 
Seismic waves
Seismic wavesSeismic waves
Seismic waves
 
4th paper
4th paper4th paper
4th paper
 
4 lite block presentation
4 lite block presentation 4 lite block presentation
4 lite block presentation
 
Mahpara
MahparaMahpara
Mahpara
 
Kekontiniuan
KekontiniuanKekontiniuan
Kekontiniuan
 
Security Tools
Security ToolsSecurity Tools
Security Tools
 

Ähnlich wie MACHINE LEARNING TOOLBOX

USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers ijcsa
 
A Survey on Machine Learning Algorithms
A Survey on Machine Learning AlgorithmsA Survey on Machine Learning Algorithms
A Survey on Machine Learning AlgorithmsAM Publications
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A SurveyLiz Adams
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...mlaij
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...mlaij
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Analysis of machine learning algorithms for character recognition: a case stu...
Analysis of machine learning algorithms for character recognition: a case stu...Analysis of machine learning algorithms for character recognition: a case stu...
Analysis of machine learning algorithms for character recognition: a case stu...nooriasukmaningtyas
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Sheila Sinclair
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfJetender Sharma
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 

Ähnlich wie MACHINE LEARNING TOOLBOX (20)

T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers
 
A Survey on Machine Learning Algorithms
A Survey on Machine Learning AlgorithmsA Survey on Machine Learning Algorithms
A Survey on Machine Learning Algorithms
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A Survey
 
Text Classification in the Field of Search Engines
Text Classification in the Field of Search EnginesText Classification in the Field of Search Engines
Text Classification in the Field of Search Engines
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Analysis of machine learning algorithms for character recognition: a case stu...
Analysis of machine learning algorithms for character recognition: a case stu...Analysis of machine learning algorithms for character recognition: a case stu...
Analysis of machine learning algorithms for character recognition: a case stu...
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 

KĂŒrzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

KĂŒrzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

MACHINE LEARNING TOOLBOX

  • 1. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 DOI:10.5121/mlaij.2016.3303 25 MACHINE LEARNING TOOLBOX Dr. Pankaj Agarwal Professor, Akshay Bhasin, Rohit Keshwani, Aman Verma and Suryansh Kaushik B.Tech CS 4th Yr Department of Computer Science, IMS Engineering College, Ghaziabad, U.P, India ABSTRACT This paper presents a review & performs a comparative evaluation of few known machine learning algorithms in terms of their suitability & code performance on any given data set of any size. In this paper, we describe our Machine Learning ToolBox that we have built using python programming language. The algorithms used in the toolbox consists of supervised classification algorithms such as NaĂŻve Bayes, Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are tested on iris and diabetes dataset and are compared on the basis of their accuracy under different conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or unstructured form and then can choose the features he/she wants to use for training the machine We have given our concluding remarks on the performance of implemented algorithms based on experimental analysis. KEYWORDS Machine Learning, Neural Networks, Backpropagation, Decision Tree (C4.5), SVM (Support Vector Machines), K-means Clustering, K-nearest Neighbors. 1. INTRODUCTION Machine Learning is simply the ability of a machine to learn from some set of data and make inferences on the basis of that data. As explained by Tom Mitchell [1] machine learning can be best defined as: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance measure at tasks in T, as measure by P, improves with experience E.” The beauty of machine learning models is that when they are exposed to new data, they are able to independently adapt. The need for fast and quick way to try different datasets on different algorithms with different parameters is still seen in the market. The use of this kind of tool can be important to researchers, students, teachers or professionals. The goal of the machine learning toolbox is to provide an easy way to upload datasets in text file format and apply various machine learning algorithms. The machine learning toolbox will provide options such as choosing both training and testing dataset to be the same or splitting the training and testing
  • 2. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 26 datasets in some proportions. The toolbox will provide suitable options in different algorithms such as providing a field for knowing the number of hidden neurons and a field for knowing the learning rate in case of neural network. This way the user can interact with the toolbox easily, try out different patterns, and see which configuration of an algorithms suits the dataset well. Machine Learning Toolbox is a research project that is basically meant to provide the user with a complete package where he/she can use it to know how different machine learning approaches and algorithms would work on a particular dataset. The toolbox would allow the user to provide a dataset either in structured or unstructured form and one can choose the features he/she wants to use for training the machine. In this manner, the user will have a complete control in the comparison process. 2. SUPERVISED MACHINE LEARNING APPROACHES Machine Learning ToolBox includes the usage of various supervised learning techniques. The search for algorithms that can reason from externally supplied instances to product a general hypothesis is known as supervised machine, which then make predictions about future instances. In other words, the main aim of supervised learning is to build a model of the distribution of class labels by using predictor features. The resulting classifier model is then used to assign class labels to the testing datasets where the values of the predicting data are known, but the value of the class label is unknown. In the present paper, we have focused on the algorithms necessary to do this. In particular, this work is concerned with classification problems in which the output of instances acknowledges only continuous, real values [2]. Table 1 gives an example of iris dataset format with four features expressed in real values & three possible output classes expressed numerically. Table 1 Example of Iris Data Set for applying supervised learning algorithm Feature 1 Feature 2 Feature 3 Feature 4 Known output class sepal length in cm sepal width in cm petal length in cm petal width in cm Iris Setosa / Iris Versicolour/ Iris Virginica Inductive machine learning is the process of creating a classifier model that can later be used to predict for new datasets. The process of applying supervised machine learning to a real-world problem is described in Figure 1.
  • 3. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 27 Figure 1 Supervised Learning Flow Diagram 2.1 NaĂŻve Bayes Classifier Naive Bayes techniques are a set of supervised learning algorithms based on Bayes’ theorem with the assumption of independence between every pair of features. Naive Bayes is a simple method for constructing classifiers: the models assign class labels to problem instances, represented as vectors of feature values, where the output class labels predicted from some finite set[3]. Naive Bayes requires a small amount of training data to estimate the parameters necessary for classification Given a class variable and a dependent feature vector through , Bayes’ theorem states the following relationship: The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of . Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. 2.2 K-nearest Neighbors Classifier KNN is a very popular classification algorithm that produces good performance characteristics and takes lesser time for training the datasets. k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. k-NN is a type of instance- based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine-learning algorithms. However, there are a few disadvantages of KNN algorithm. First, all the neighbors are treated equally in the standard KNN algorithm. Second, the imbalanced data problem can occur in KNN algorithms. Large classes always have a better chance to win [4].
  • 4. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 28 Step 1) Search the K training instances that are closest to unknown instance. Step 2) Pick the most commonly occurring class label for these K instances. In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point. A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance) 2.3 Decision Tree Classifier Decision Tree Classifier is a simple and widely used classification technique. It applies a simple approach to solve the classification problem. Each time it receives an answer, a follow-up question is asked until a conclusion about the class label of the record is reached. A decision tree is a tree like structure where in decisions are taken based on values of some attributes and thus further branching is done. Decision tree can classify a data to some class label by moving from the root node and comparing the attribute values as specified in the nodes until a leaf node is reached. The leaf node contains the class label. It aims is the partition of a dataset into groups as homogeneous as possible in terms of the variable to be predicted [5]. Creating a optimal decision tree is a key problem in decision tree classifier. In general, for many cases a decision tree can be constructed from a given set of attributes. While in some cases, finding the optimal tree is computationally infeasible because of the exponential size of the search space. However, various efficient algorithms have been developed to construct a reasonably accurate, suboptimal decision tree in a reasonable amount of time. These algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally optimum decisions about which attribute to use for partitioning the data. For example, Hunt's algorithm, ID3, C4.5, CART, SPRINT are greedy decision tree induction algorithms. 2.4 SVM Classifier The support vector machine (SVM) is a training algorithm for classification rule from the data set which trains the classifier; it is then used to predict the class of the new sample. SVM is based on the concept of decision planes that define decision boundary and point to form the decision boundary between the classes called support vector threat as parameter [6]. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks as shown below. .
  • 5. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 29 Figure 2 SVM Illustration 2.5 Artificial Neural Network Classifier Artificial Neural Network is a network where nodes are interconnected and they represent some value and are connected to each other via some weighted edges. This network is parallel in the sense that the neurons are collected at different layers and adjacent layer neurons are connected. ANN was made to mimic the working of a biological nervous system where each neuron contributes some amount to the final output. Neural networks are made up of layers, which are in turn made up of a number of interconnected nodes which contain an activation function. The nodes from input layer are connected to the nodes of the hidden layer with weighted edges. The hidden layers are linked to an output layer where the answer is received. It is represented as: Figure 3 Neural Network Illustration These networks have been established to be universal approximates of continuous/discontinuous functions and therefore they are suitable for usage where we have some information about the
  • 6. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 30 input-output map to be approximated. A set of data (Input-Output information) is used for training the network. Any input can be given once the network has been trained and it will give an output, which would correspond to the expected output from the approximated mapping. The activation function used is the log-sigmoid function as given in can be expressed as: - Where w array contains the weights of the edges and x array contains the outputs of the previous layer. For the hidden layer x correspond to the input of the network while for the output layer x correspond to the output of the hidden layer. The network is trained using the error back propagation algorithm. The weight update rule as given in can be expressed as: - where α is called the momentum constant, η is the learning rate, ∆wji(n) is the correction applied to the synaptic weight connecting the output of neuron i to the input of neuron j at iteration n, ÎŽj (n) is the local gradient at nth iteration, yi (n) is the function signal appearing at the output of neuron i at iteration n. The percentage error is specified beforehand according to a problem and thus the network tries to converge to that error by using back propagation technique. The advantage of using neural network approach is that they are robust with noisy data, adaptable, nonparametric, robust with respect to node failure and empirically shown to work well for many problem domains. 3. PROPOSED MACHINE LEARNINGTOOLBOX The machine learning toolbox is developed using python as the core programming language and the main aim was to provide a code free environment to a user so that he/she can easily try out different algorithms with different parameters and thus can choose the algorithm that fits the datasets best. It takes in the dataset in arff format. An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. Two datasets that were used here for comparative analysis are Iris dataset and a Pima Indian Diabetes dataset. One of them is Iris dataset or Fisher's Iris data set. It is a data set produced by Ronald Fisher that he used in his paper “The use of multiple measurements in taxonomic problems” as an example of linear discriminant analysis. There are 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the
  • 7. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 31 length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, the task is to distinguish the species from each other. [8] The other dataset is Pima Indian Diabetes dataset, a determination of whether a given woman is diabetic[9]. There are 7 predictors, including personal history and medical measurements and of the women were Pima Indians. The toolbox provides algorithms like NaĂŻve Bayes, SVM, Neural Network, Decision Tree and K- nearest neighbors. It provides an easy to use GUI, which takes in input basic parameters for each algorithm. For example, in neural network, there are field to enter learning rate, momentum, number of hidden neurons and the maximum epochs up to which it should be trained. 4. RESULTS Accuracy of Results NaĂŻve Bayes Algorithm NaĂŻve Bayes was run on iris dataset having 150 training values and was tested on the same 150 test cases. The accuracy was seen to be 96%. Naive Bayes was also run on diabetes dataset having about 750 training values and was tested 150 new values and accuracy between was 70- 76% was observed K-nearest Neighbors Algorithm K-nearest neighbors was run on iris dataset and diabetes datasets and showed the following results for various training and testing datasets partition: Table 2 Accuracy results for iris dataset on K-nearest neighbors K Split(TRAIN:TEST) 100:100 80:20 70:30 1 100% 96.66% 97.77% 3 96% 96.66% 95.55% Table 3 Accuracy results for diabetes dataset on knn K Split(TRAIN:TEST) 100:100 80:20 70:30 1 100% 65.58% 66.23% 3 85.41% 70.12% 70.56%
  • 8. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 32 SVM Algorithm SVM classifier was run on iris dataset and diabetes datasets and showed the following results for various training and testing datasets partition: Table 4 Accuracy results for iris and diabetes dataset on SVM Dataset Split(TRAIN:TEST) 100:100 80:20 70:30 Iris 82% 93.33% 95.55% Diabetes 74.60% 67.53% 62.77% Decision Tree Algorithm Decision Tree classifier was run on iris dataset and diabetes datasets and showed the following results for different training and testing datasets partition: Table 5 Accuracy results for iris and diabetes dataset on Decision Tree Dataset Split(TRAIN:TEST) 100:100 80:20 70:30 Iris 100% 70.97% 73.91% Diabetes 100% 53.55% 53.45% Neural Network Backpropagation Algorithm Neural Net classifier was run on iris dataset and diabetes datasets and showed the following results for various training and testing datasets partition: Table 6 Accuracy results for iris and diabetes dataset on Neural Network Dataset Hidden Layers Split(TRAIN:TEST) 100:100 80:20 70:30 Iris 4 76% 75% 70.47% 5 97.33% 75% 92.35% Diabetes 4 65.10% 65.31% 64.05% 5 65.10% 65.53% 65.41%
  • 9. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 33 Summarized Result Running different classifiers on different datasets with different parameters show a lot of variation in the accuracy results on iris dataset. This can be summarized as: Table 7 Comparison of results on iris and diabetes datasets on different algorithms Algorithm Average Accuracy Remarks NaĂŻve Bayes 88% The fastest of all algorithms with a higher accuracy rate. K-nearest Neighbors 96% This algorithm worked best on iris dataset. SVM 84% Slower than most algorithms and low accuracy due to random choice of attributes. Decision Tree 80% Moderate time and accuracy results. Neural Network 78% Slowest of all algorithms and works well with low number of hidden neurons. 5. CONCLUSIONS AND FUTURE SCOPE We can easily apply implemented ML algorithms on any data set with set of numerical features and setting the appropriate parameters through our GUI. This can be surely a useful tool for researchers/students who wish to apply a ML algorithm on his/her dataset and understand which algorithm works better. We are currently working on adding many more ML algorithms and adding graphical interpretations to compare the effectiveness of algorithms. It is expected that a fully developed toolbox available on web will also do commercially well. At least we were unable to find similar kind of toolbox on web. We will also be including unsupervised methods in the upcoming versions. REFERENCES 1. T. M. Mitchell, “Machine Learning,” McGraw Hill, 1997. 2. Ch. M. Bishop, “Pattern Recognition and Machine Learning,” Springer, 2006. 3. P. Langley, W. Iba., and K. Thompson. Decision making using probabilistic inference methods. In Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI’92), pages 399–406, San Mateo, CA, 1992. 4. Nitin Bhatia, vandana” Survey on nearest neighbor techniques” IJCSIS, Vol 80, no 2(2010). 5. HSSINA, Badr et al. "A Comparative Study Of Decision Tree ID3 And C4.5". International Journal of Advanced Computer Science and Applications 4.2 (2014): n. pag. Web. 6. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). Support Vector Machines. In An Introduction to Statistical Learning (pp. 337-372). Springer New York. 7. S.B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica 31(2007) 249-268, 2007.
  • 10. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016 34 8. Wikipedia (2012). Iris flower data set. Retrieved 18 April, 2016, https://en.wikipedia.org/wiki/Iris_flower_data_set. 9. https://archive.ics.uci.edu/ml/datasets.html ACKNOWLEDGEMENT Dr. Pankaj Agarwal is current associate as Professor & Head, Department of Computer Science & Engineering, IMS Engineering College,Ghaziabad, U.P, India. He has more than 15 years of teaching & more than 8 years of research experience. Dr. Agarwal has published research publications in more than 30 International Journals & Conferences of repute. He has also authored five books on subjects including Algorithm Analysis, Software Project Management, .NET, Cyber Laws, Computer Programming. His current areas of interest includes softcomputing, Machine learning & data analytics. I would like to appreciate my team of students including Akshay, Rohit, Aman & Suryansh, studying in B.Tech 4th year who took up this project and completed the assigned tasks as per the desired expectations. We plan to add many new features to this toolbox.