Q. Write advantages, disadvantages and applications of different
algorithms which are used in Data Mining?
Ans. Decision Trees
In simple words, a decision tree is a structure that contains nodes (rectangular
boxes) and edges(arrows) and is built from a dataset (table of columns
representing features/attributes and rows corresponds to records). Each node is
either used to make a decision (known as decision node) or represent an
outcome (known as leaf node).
1.Naive Bayes classifier (NBC)
Naive Bayes is a machine learning algorithm we use to solve classification
problems. It is based onthe Bayes Theorem. It is one of the simplest yet powerful
ML algorithms in use and finds applications in many industries.
Supposeyou have to solve a classification problem and have created the features
and generated the hypothesis, but your superiors want to seethe model. You have
numerous data points (lakhs of data points) and many variables to train the
dataset. The best solution for this situation would be to use the Naive Bayes
classifier, which is quite faster in comparison to other classification algorithms.
1) The naive Bayesian model originated from classical mathematical theory
and has a solid mathematical foundation and stable classification
2) It has a higher speed for large numbers of training and queries. Even with
very large training sets, there is usually only a relatively small number of
features for each project, and the training and classification of the project
is only a mathematical operation of the feature probability;
3) It works well for small-scale data, can handle multi-category tasks, and is
suitable for incremental training (that is, it can train new samples in real
4) Less sensitive to missing data, the algorithm is also relatively simple, often
used for text classification;
5) Naïve Bayes explains the results easily.
Disadvantages of NBC:
1) There is an error rate in the classification decision;
2) Very sensitive to the form of input data;
3) The assumption of sample attribute independence is used, so if the sample
attributes are related, the effect is not good.
4) Naive Bayes assumes that all predictors (or features) are independent,
rarely happening in real life. This limits the applicability of this algorithm
in real-world use cases.
5) This algorithm faces the ‘zero-frequency problem’ where it assigns zero
probability to a categorical variable whose category in the test data set
wasn’t available in the training dataset. It would be best if you used a
smoothing technique to overcome this issue.
6) Its estimations can be wrong in some cases, so you shouldn’t take its
probability outputs very seriously.
Applications of Naive Bayes Algorithms
Real-time Prediction: As Naive Bayes is super fast; it can be used for
making predictions in real time.
This algorithm can predict the posterior probability of multiple classes of
the target variable.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes
classifiers are mostly used in text classification (due to their better results
in multi-class problems and independence rule) have a higher success rate
as compared to other algorithms. As a result, it is widely used in Spam
filtering (identify spam e-mail) and Sentiment Analysis (in social media
analysis, to identify positive and negative customer sentiments)
RecommendationSystem: Naive Bayes Classifier along with algorithms
like Collaborative Filtering makes a Recommendation System that uses
machine learning and data mining techniques to filter unseen information
and predict whether a user would like a given resource or not.
2. Iterative Dichotomiser 3
ID3 stands for Iterative Dichotomiser 3 is a classificationalgorithmand is named
suchbecausethe algorithm iteratively (repeatedly) dichotomizes(divides) features
into two or more groups at each step. Invented by Ross Quinlan, ID3 uses a top-
down greedy approach to build a decision tree. In simple words, the top-
down approach means that we start building the tree from the top and
the greedy approach of building a decision tree by selecting a best attribute that
yields maximum Information Gain (IG) or minimum Entropy (H).
Advantages of using ID3
1) Understandable prediction rules are created from the training data.
2) Builds the fastest tree.
3) Builds a short tree.
4) Only need to test enough attributes until all data is classified.
5) Finding leaf nodes enables test data to be pruned, reducing number of
tests. Whole dataset is searched to create tree.
Disadvantages of using ID3
1) Data may be over-fitted or over-classified, if a small sample is tested.
2) Only one attribute at a time is tested for making a decision.
3) Classifying continuous data may be computationally expensive, as many
trees must be generated to see where to break the continuum.
Applications of ID3
ID3 algorithm is used in many places some are as land capability classification
Information Asset Identification etc.
3. K-Nearest Neighbours
KNN for NearestNeighbourSearch:KNN algorithm involves retrieving the K
datapoints that are nearest in distance to the original point. It can be used for
classification or regression by aggregating the target values of the nearest
neighbours to make a prediction. However, just retrieving the nearest
neighbours is a very important aspect in several applications. For instance,
suppose we write a movie recommender system, once we find a suitable vector
representation for all the movies, given a movie, recommending the five closest
movies involves retrieving the five nearest neighbour vectors.
KNN for classification: KNN can be used for classification in a supervised
setting where we are given a dataset with target labels. For classification, KNN
finds the k nearest data points in the training set and the target label is computed
as the mode of the target label of these k nearest neighbours.
KNN for Regression: KNN can be used for regression in a supervised setting
where we are given a dataset with continuoustarget values. Forregression, KNN
finds the k nearest data points in the training set and the target value is computed
as the mean of the target value of these k nearest neighbours.
Advantages of KNN
1) K-NN is pretty intuitive and simple: K-NN algorithm is very simple to
understand and equally easy to implement. To classify the new data point
K-NN algorithm reads through whole dataset to find out K nearest
2) K-NN has no assumptions: K-NN is a non-parametric algorithm which
means there are assumptions to be met to implement K-NN. Parametric
models like linear regression has lots of assumptions to be met by data
before it can be implemented which is not the case with K-NN.
3) No Training Step: K-NN does not explicitly build any model, it simply
tags the new data entry-based learning from historical data. New data entry
would be tagged with majority class in the nearest neighbour.
4) It constantly evolves: Given it’s an instance-based learning; k-NN is a
memory-based approach. The classifier immediately adapts as we collect
new training data. It allows the algorithm to respond quickly to changes in
the input during real-time use.
5) Very easy to implement for multi-class problem: Most of the classifier
algorithms are easy to implement for binary problems and needs effort to
implement for multi class whereas K-NN adjust to multi class without any
6) Can be used both for Classificationand Regression: One of the biggest
advantages of K-NN is that K-NN can be used both for classification and
7) One Hyper Parameter: K-NN might take some time while selecting the
first hyper parameter but after that rest of the parameters are aligned to it.
8) Variety of distance criteria to be choose from: K-NN algorithm gives
user the flexibility to choose distance while building K-NN model.
a. Euclidean Distance
b. Hamming Distance
c. Manhattan Distance
d. Makowski Distance
Even though K-NN has several advantages but there are certain very important
disadvantages or constraints of K-NN.
Disadvantages of KNN
1) K-NN slow algorithm: K-NN might be very easy to implement but as
dataset grows efficiency or speed of algorithm declines very fast.
2) Curse of Dimensionality: KNN works well with small number of input
variables but as the numbers of variables grow K-NN algorithm struggles
to predict the output of new data point.
3) K-NN needs homogeneous features: If you decide to build k-NN using a
common distance, like Euclidean or Manhattan distances, it is completely
necessary that features have the same scale, since absolute differences in
features weight the same, i.e., a given distance in feature 1 must means the
same for feature 2.
4) Optimal number of neighbours: One of the biggest issues with K-NN is
to choose the optimal number of neighbours to be consider while
classifying the new data entry.
5) Imbalanced data causes problems: k-NN doesn’t perform well on
imbalanced data. If we consider two classes, A and B, and the majority of
the training data is labelled as A, then the model will ultimately give a lot
of preference to A. This might result in getting the less common class B
6) Outlier sensitivity: K-NN algorithm is very sensitive to outliers as it
simply chose the neighbours based on distance criteria.
7) Missing Value treatment: K-NN inherently has no capability of dealing
with missing value problem.
Applications of KNN
Used in classification and Interpretation (legal, news, banking)
Used in get missing values
Used in pattern recognition
Used in gene expression
Used in protein-protein prediction
Used to get 3D structure of problem
Used to measure document similarity
Problem solving (planning, pronunciation)
Functional learning (dynamic control)
Teaching and aiding (help desk, user training)
4. Classification and Regression Trees (CART) Algorithm
Classification and Regression Trees (CART) is only a modern term for what are
otherwise known as DecisionTrees.Decision Trees have been around for a very
long time and are important for predictive modelling in Machine Learning. As
the name suggests, these trees are used for classification and prediction problems.
These models are obtained by partitioning the data space and fitting a simple
prediction model within each partition. This is donerecursively. Wecan represent
the partitioning graphically as a tree; hence the name.
They have withstood the test of time because of the following reasons:
1. Very competitive with other methods
2. High efficiency
Classification trees which are used to separate a dataset into different classes
(generally used when we expect categorical classes). The other type are
Regression Trees which are used when the class variable is continuous (or
Advantages of CART
1) CART does not require any assumptions for underlying distributions.
2) It is easy to use and can quickly provide valuable insights.
3) CART can be used efficiently to assess massive datasets
4) be further used to drill down to a particular cause and find effective, quick
5) The solution is easily interpretable, intuitive and can be verified with
6) it is a good way to present solutions to management.
Disadvantages of CART
1) The biggest limitation is the fact that it is a nonparametric technique; it is
not recommended to make any generalization on the underlying
phenomenon based upon the results observed. Although the rules obtained
through the analysis can be tested on new data, it must be remembered that
the model is built based upon the sample without making any inference
about the underlying probability distribution.
2) Another limitation of CART is that the tree becomes quite complex after
seven or eight layers.
3) Interpreting the results in this situation is not intuitive.
Applications of CART:
CART is used in many places in machine learning such as Blood Donors
Classificationn, spatial data environmental and ecological data, Hepatitis disease
5. K- Means Clustering
Means algorithm is an iterative algorithm that tries to partition the dataset
into Kpre-defined distinct non-overlapping subgroups (clusters) where each data
point belongs to only one group. It tries to make the intra-cluster data points as
similar as possible while also keeping the clusters as different (far) as possible. It
assigns data points to a cluster such that the sum of the squared distance between
the data points and the cluster’s centroid (arithmetic mean of all the data points
that belong to that cluster) is at the minimum. The less variation we have within
clusters, the more homogeneous (similar) the data points are within the same
Advantages of K-means
1) Relatively simple to implement.
2) Scales to large data sets.
3) Guarantees convergence.
4) Can warm-start the positions of centroids.
5) Easily adapts to new examples.
6) Generalizes to clusters of different shapes and sizes, such as elliptical
Disadvantages of K-means
1) Being dependent on initial values.
Fora low k, you can mitigate this dependenceby running k-means several
times with different initial values and picking the best result.
As k increases, you need advanced versions of k-means to pick better
values of the initial centroids (called k-means seeding).
2) Clustering data of varying sizes and density
K-means has trouble clustering data where clusters are of varying sizes and
density. To cluster such data, you need to generalize k-means.
3) Clustering outliers
Centroids can be dragged by outliers, or outliers might get their own cluster
instead of being ignored. Consider removing or clipping outliers before
4) Scaling with number of dimensions
As the number of dimensions increases, a distance-based similarity
measure converges to a constant value between any given examples.
Reducedimensionality either by using PCAonthe feature data, orby using
“spectral clustering” to modify the clustering algorithm .
Applications of K-Means Clustering
K-Means clustering is used in a variety of examples or business cases in real life,
Wireless sensor networks
Academic Performance: Based on the scores, students are categorized
into grades like A, B, or C.
Diagnostic systems: The medical profession uses k-means in creating
smarter medical decision support systems, especially in the treatment of
Search engines:Clustering forms a backbone of search engines. When a
search is performed, the search results need to be grouped, and the search
engines very often use clustering to do this.
Wireless sensor networks: The clustering algorithm plays the role of
finding the cluster heads, which collects all the data in its respective cluster.