SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
K- Means Clustering
Parameter Tuning & Use cases
Terminologies
Introduction & Example
Standard input/tuning parameters & Sample UI
Sample output UI
Interpretation of Output
Limitations
Business use cases
What Are
All Covered
Introduction With
Example
What is it used for?
It’s a process by which objects are classified into
number of groups so that they are as much
dissimilar as possible from one group to another
group and as much similar as possible within
each group
Thus it’s simply a grouping of similar things
/data points
For example ,objects within group 1(cluster 1)
shown in image above should be as similar as
possible
But there should be much difference between
an object in group 1 & group 2
The attributes of objects decide which objects
should be grouped together
Thus natural grouping of data points can be
achieved
Some Examples
Let’s
take a
few
examples
for more
clarity :
Loan applicants in a bank can be grouped into : low ,
medium , high risk applicants based on their age, annual
income ,employment tenure, loan amount , times
delinquent etc. using K means clustering algorithm
Movie tickets booking website users can be grouped into
movie freaks/moderate watchers/ rare watchers based
on their past movie tickets purchase behavior such as
days from last movie seen , average number of tickets
booked each time , frequency of tickets booking per
month , etc.
Retail customers can be clubbed into loyal / infrequent /
rare customer groups based on their retail
outlet/website visits per month , purchase amount per
month , purchase frequency per month etc.
It is used to find groups which have not been
explicitly labeled in the data. This can be
used to confirm business assumptions about
what types of groups exist or to identify
unknown groups in complex data sets.
Once the algorithm has been run and the
groups are defined, any new data can be
easily assigned to the correct group
How it works
Step 1: Begin with a decision on
the value of k : Number of clusters
(groups) and input variables. Use
silhouette score to determine k.
Step 2: Scale the data using [(x-
min(x)/max(x)-min(x)] and initialize
cluster centers. Randomly select k
observations from the scaled data
and consider them as initial cluster
centers.
Step 3: Calculate euclidean
distance between an observation
and initial cluster centers.
• Based on euclidean distance, each
observation is assigned to one of the
clusters - based on minimum distance.
Step 4: Move onto next
observation , calculate euclidean
distance, update cluster centers
and assign this observation a
cluster membership based on
minimum distance same as step 3.
Step 5: Repeat step 4 until all
observations are assigned a cluster
membership.
Step 6 : Check cluster plot and
silhouette score to measure the
goodness of clusters generated.
How it works – Steps
Height Weight
185 72
170 56
168 60
179 68
182 72
188 77
180 71
180 70
183 84
180 88
180 67
177 76
Data sample :
Cluster
Initial cluster centers
Height Weight
K1 185 72
K2 170 56
Step 1: Input
• Scaled variables and Number of
Clusters (k)
• In this example, only two
variables –height and weight –
are considered for clustering
• Let’s consider number of clusters
=2
Step 2: Initialize cluster
centers
• Let’s initialize cluster centers
with first two observations
Step 3: Calculate
Euclidean distance
• Euclidean distance between an
observation and initial cluster
centers 1 and 2 is calculated.
• Based on Euclidean distance,
each observation is assigned to
one of the clusters - based
on minimum distance
Example
Height Weight
185 72
170 56
First two observations
Cluster Height Weight
K1 185 72
K2 170 56
Updated centers
Euclidian Distance from
Cluster 1
Euclidian Distance from
Cluster 2
Cluster
Assignment
SQRT [(185-185)2+(72-72)2 ] =0
SQRT [(185-170)2+(72-56)2] =
21.93
1
SQRT [(170-185)2+(56-72)2] =
21.93
SQRT [(170-170)2+(56-56)2] = 0 2
Euclidean Distance from each of the clusters is calculated:
Step 3: Continue…
There is no change in centers as we considered same two observations as initial centers
Example
Height Weight
168 60
Next observation
Cluster Height Weight
K1 185 72
K2
(170 +168)/2
=169
(56 +60)/2=
58
Updated cluster centers
Euclidian Distance from Cluster 1 Euclidian Distance from Cluster 2
Cluster
Assignment
SQRT [(168-185) 2+(60-72) 2] =20.808 SQRT[((168-170)2+(60-56) 2] = 4.472 2
Step 4 :
Move onto next observation,
calculate euclidean distance,
assign cluster membership and
update cluster centers
• Since distance is minimum from cluster 2, the observation is assigned to cluster 2.
• Now revise Cluster centers – Mean value of observations’ Height and Weight.
• Addition is only to cluster 2, so centroid of cluster 2 will be updated as follows :
Example
Height Weight
179 68
Next observation
Cluster Height Weight
K1
(185 +179)/2
=182
(72 +68)/2
=70
K2 169 58
Updated cluster centers
Euclidian Distance from Cluster 1 Euclidian Distance from Cluster 2
Cluster
Assignment
SQRT [(179-185) 2+(68-72) 2] =7.21 SQRT[((179-170)2+(68-56) 2] = 14.14 1
Step 5:
Repeat steps 4 : calculate
Euclidean distance for next
observation, assign next
observation based on minimum
distance & update the cluster
centers until all observations are
assigned a cluster membership
o Since distance is minimum from cluster 1, the observation is assigned to cluster 1.
o Now revise Cluster Centroid – Mean value of observations’ Height and Weight.
o Addition is only to cluster 1, so centroid of cluster 1 will be updated as follows :
Example
Step 6 :
Draw cluster plot to see
how clusters are
distributed. Lesser the
overlap between
clusters , better the
distribution and cluster
assignments.
Cluster
Updated
Centroid
Height Weight
K=1 182.8 72
K=2 169 58
Final assignments
Final cluster centers
Cluster plot
silhouette = 0.8
Indicating very good quality clusters
Closer the silhouette score to 1 , better the quality of clusters
Example
Standard Tuning
Parameters
Standard Tuning Parameter
oNumber of clusters (K) :
• The desired number of clusters
• Suggested range : 3 to 5
• The actual number could be
smaller in the output if there are
no divisible clusters in the data
• This parameter input can be
automated using silhouette
score(explained in later slides).
Max Iterations:
• The max number of k-means
iterations to split clusters
• By default this value should be
set to 20
Sample UI For Input
Variables & Parameters
Selection & Output
Sample UI for selecting predictors and applying
tuning parameters: For Two Predictors
Select the variables you
would like to use as
predictors to build
clusters
Height
Weight
BMI
21
Tuning parameters
Number of clusters
Maximum iterations
 The silhouette score is another useful criterion for assessing the natural and optimal
number of clusters as well as for checking overall quality of partition
 The largest silhouette score, over different K, indicates the best number of clusters
Height(cm) Weight(Kg) BMI
Cluster
Number
158 60 23 1
160 65 25 2
170 70 26 2
149 50 21 1
180 80 27 3
165 80 28 3
200 90 23 1
Each customer is assigned a cluster membership as shown in the table in left
Height
Weight
silhouette = 0.7
Indicating good quality clusters
Output UI: For Two
Predictors
As clusters are built using only 2 predictors
here , scatter plot axis will reflect actual
predictors i.e. Height and Weight instead of
principle components .
Again, lesser the overlap in cluster outlines ,
better the clusters’ assignment
Alternatively silhouette score can be checked
to evaluate how clusters are partitioned -
Closer this value to 1 , better the partition
quality.
Sample UI for Selecting Predictors And Applying
Tuning Parameters: For Four Predictors
Select the variables you
would like to use as
predictors to build
clusters
Purchase amount
Purchase frequency
Total purchase quantity
Annual income
Website visits
21
Tuning parameters
Number of clusters
Maximum iterations
 The silhouette score is another useful criterion for assessing the natural and optimal
number of clusters as well as for checking overall quality of partition
 The largest silhouette score, over different K, indicates the best number of clusters
Customer ID
Purchase
amount(per
month)
Purchase
frequency
(per month)
Total
purchase
quantity
Annual
Income slab
Website
visits (per
month)
Cluster
Number
1 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2
2 1k to 5k 1 4 1 lac to 2 lac <3 1
3 10 to 15k 3 8 4 lac to 6 lac 6 to 10 3
4 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2
5 10 to 15k 3 8 4 lac to 6 lac 6 to 10 3
6 1k to 5k 1 2 1 lac to 2 lac <3 1
7 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2
8 1k to 5k 1 2 1 lac to 2 lac <3 1
9 1k to 5k 1 2 1 lac to 2 lac <3 1
10 >20k 4 32
10 lac to 15
lac
>10 3
Each customer is assigned a cluster membership as shown in the table in left
First principal component
Secondprincipalcomponent
Output UI: For Four
Predictors
In the 2D cluster plot shown in right , clusters
distribution is plotted.
In this case, axis will reflect first two principle
components (check definition below ) instead of
actual predictors as number of predictors is >2. In
case of 3D plot , first three principal components
will be shown. Lesser the overlap between clusters
, better the clusters’ assignment.
Also silhouette score can be checked to evaluate
how clusters are partitioned - Closer this value to 1
, better the partition quality.
Whenever there are more than 2 predictors as
input , axis will reflect principle components
instead of actual predictors
Principle components are linear combination of
original predictors which captures the maximum
variance in data set.
Most of the variance in data is explained by first
three principle components so we can ignore
remaining components.
Other Sample Output
Formats
silhouette = -0.5
Indicating poor quality clusters
silhouette = 0.7
Indicating good quality clusters
 Clusters with silhouette score closer to 1 are more desirable. So this index can be
used to measure the goodness of clusters.
silhouette = 0.3
Average quality clusters
Other Sample Output Formats
Limitations
Limitations
• The number of clusters, k, must be determined before hand. Instead the
algorithm should auto suggest this number for better user friendliness.
• It does not yield the same result with each run, since the resulting clusters
depend on the initial random assignments for group centers.
• If it is inputted in a different order it may produce different cluster if the number
of data points are few, hence number of data points must be large enough.
• It has been suggested that 2m can be used (where m = number of clustering variables) as a
rule to decide sample data size.
• K-means is suitable only for numeric data.
• Scale of data points influences Euclidean distance , so variable standardization
becomes necessary.
• Empty clusters can be obtained if no points are allocated to a cluster during the
assignment step.
Applications & Business
Use Cases
General applications of K-means Clustering
• Some examples of use cases are:
• Behavioral segmentation:
• Segment customers by purchase history
• Segment users by activities on application, website, or platform
• Define personas based on interests
• Create consumer profiles based on activity monitoring
• Some other general applications :
• Pattern recognitions, Market segmentation, Classification analysis, Artificial
intelligence, Image processing , astronomy , agriculture and many others.
Use case 1
• Business problem :
• Grouping loan applicants into high/medium/low risk applicants based on
attributes such as Loan amount , Monthly installment, Employment tenure ,
Times delinquent, Annual income, Debt to income ratio etc.
• Business benefit:
• Once segments are identified , bank will have a loan applicants’ dataset with
each applicant labeled as high/medium/low risk.
• Based on this labels , bank can easily make a decision on whether to give loan
to an applicant or not and if yes then how much credit limit and interest rate
each applicant is eligible for based on the amount of risk involved.
Use case 1
Customer ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
1039153 21000 701.73 105000 9 5 4
1069697 15000 483.38 92000 11 5 2
1068120 25600 824.96 110000 10 9 2
563175 23000 534.94 80000 9 2 12
562842 19750 483.65 57228 11 3 21
562681 25000 571.78 113000 10 0 9
562404 21250 471.2 31008 12 1 12
700159 14400 448.99 82000 20 6 6
696484 10000 241.33 45000 18 8 2
702598 11700 381.61 45192 20 7 3
702470 10000 243.29 38000 17 9 7
702373 4800 144.77 54000 19 8 2
701975 12500 455.81 43560 15 8 4
Input dataset :
Use case 1
Output : Each record will have the cluster (segment) assignment as shown below :
Customer ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Cluster
1039153 21000 701.73 105000 9 5 4 Medium
1069697 15000 483.38 92000 11 5 2 Medium
1068120 25600 824.96 110000 10 9 2 Medium
563175 23000 534.94 80000 9 2 12 Low
562842 19750 483.65 57228 11 3 21 Low
562681 25000 571.78 113000 10 0 9 Low
562404 21250 471.2 31008 12 1 12 Low
700159 14400 448.99 82000 20 6 6 High
696484 10000 241.33 45000 18 8 2 High
702598 11700 381.61 45192 20 7 3 High
702470 10000 243.29 38000 17 9 7 High
702373 4800 144.77 54000 19 8 2 High
701975 12500 455.81 43560 15 8 4 High
Use case 1
Output : Cluster profiles :
As can be seen in the table above,
there are distinctive characteristics of
high /medium and low risk segments
High risk segment has high likelihood
to be delinquent, highest debt to
income ratio and lowest employment
tenure as compared to other two
segments
Whereas low risk segment exhibits
exactly the opposite pattern i.e.
lowest debt to income ratio, lowest
delinquency and highest employment
tenure as compared to other two
segments
Hence , delinquency , employment
tenure and debt to income ratio are
the determinant factors when it
comes to segmenting loan applicants
Cluster
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Risk
Segment
1 10447.30 304.87 66467.74 9.58 1.69 16.82 Low
2 21391.58 598.54 94912.59 12.37 5.98 4.58 Medium
3 7521.32 227.43 60935.28 16.55 6.91 4.01 High
Use case 1
Output : Cluster
distribution:
In the cluster distribution plot , there is
negligible overlap in cluster outlines so we
can say that cluster assignments is good in
our case
Clusters with silhouette width average closer
to 1 are more desirable. So this index can be
used to test the quality of clusters’
distribution.
silhouette = 0.6
Indicating good quality
clusters
Use case 2
Business benefit:
• Once segments are identified
, marketing messages and
even products can be
customized for each segment.
• The better the segment(s)
chosen for targeting by a
particular organization , the
more successful it is assumed
to be in the market place.
Business problem :
• Organizing customers into
groups/segments based on
similar traits, product
preferences and expectations
• Segments are constructed on
basis of the customers’
demographic characteristics,
psychographics, past behavior
and product use behaviors
Use case 3
Business benefit:
• Business marketing team can focus
on risky customer segments in
efficient way in order to avert them
from churning/leaving
• Sales team segments which are
facing challenges based on current
discounting strategy can be
identified and deal negotiation
strategy can be improved
/optimized for them.
Business problem :
• Discount Analysis and Customer
Retention – Visualize ‘segments of
sales group based on discount
behavior’ and ‘customer churn -
segments of customers on verge of
leaving’
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

Weitere ähnliche Inhalte

Was ist angesagt?

Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognitionAl Mamun
 
Data Communication Principles
Data Communication PrinciplesData Communication Principles
Data Communication PrinciplesKamal Acharya
 
Cloud computing notes
Cloud computing notesCloud computing notes
Cloud computing notesSrinivasa Rao
 
Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)Ravindra Dastikop
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
recursive transition_networks
recursive transition_networksrecursive transition_networks
recursive transition_networksRajendran
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Chest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningChest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningBaoTramDuong2
 
Network Layer,Computer Networks
Network Layer,Computer NetworksNetwork Layer,Computer Networks
Network Layer,Computer Networksguesta81d4b
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Jitendra s Rathore
 
Biology protein structure in cloud computing
Biology protein structure in cloud computingBiology protein structure in cloud computing
Biology protein structure in cloud computinggaurav jain
 
Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Ravindra Dastikop
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptxAcad
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented CommunicationDilum Bandara
 
The Data Link Layer
The Data Link LayerThe Data Link Layer
The Data Link Layerrobbbminson
 

Was ist angesagt? (20)

HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognition
 
Data Communication Principles
Data Communication PrinciplesData Communication Principles
Data Communication Principles
 
Cloud computing notes
Cloud computing notesCloud computing notes
Cloud computing notes
 
Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)Cloud Computing and Service oriented Architecture (SOA)
Cloud Computing and Service oriented Architecture (SOA)
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Unit v
Unit vUnit v
Unit v
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
recursive transition_networks
recursive transition_networksrecursive transition_networks
recursive transition_networks
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Chest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningChest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep Learning
 
Network Layer,Computer Networks
Network Layer,Computer NetworksNetwork Layer,Computer Networks
Network Layer,Computer Networks
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2
 
Biology protein structure in cloud computing
Biology protein structure in cloud computingBiology protein structure in cloud computing
Biology protein structure in cloud computing
 
Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
The Data Link Layer
The Data Link LayerThe Data Link Layer
The Data Link Layer
 

Ähnlich wie What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to Analyze Data?

What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptSyedNahin1
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesKush Kulshrestha
 
K – means cluster analysis.pptx
K – means cluster analysis.pptxK – means cluster analysis.pptx
K – means cluster analysis.pptxagniva pradhan
 
K-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxK-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxJebaRaj26
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clusteringmonalisa Das
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7Birat Sharma
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classificationJamshed Khan
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysisAnimesh Kumar
 

Ähnlich wie What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to Analyze Data? (20)

What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
07 learning
07 learning07 learning
07 learning
 
K – means cluster analysis.pptx
K – means cluster analysis.pptxK – means cluster analysis.pptx
K – means cluster analysis.pptx
 
K-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxK-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptx
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Clustering &amp; classification
Clustering &amp; classificationClustering &amp; classification
Clustering &amp; classification
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 

Mehr von Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 

Mehr von Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to Analyze Data?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 2. K- Means Clustering Parameter Tuning & Use cases
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What Are All Covered
  • 5. What is it used for? It’s a process by which objects are classified into number of groups so that they are as much dissimilar as possible from one group to another group and as much similar as possible within each group Thus it’s simply a grouping of similar things /data points For example ,objects within group 1(cluster 1) shown in image above should be as similar as possible But there should be much difference between an object in group 1 & group 2 The attributes of objects decide which objects should be grouped together Thus natural grouping of data points can be achieved
  • 6. Some Examples Let’s take a few examples for more clarity : Loan applicants in a bank can be grouped into : low , medium , high risk applicants based on their age, annual income ,employment tenure, loan amount , times delinquent etc. using K means clustering algorithm Movie tickets booking website users can be grouped into movie freaks/moderate watchers/ rare watchers based on their past movie tickets purchase behavior such as days from last movie seen , average number of tickets booked each time , frequency of tickets booking per month , etc. Retail customers can be clubbed into loyal / infrequent / rare customer groups based on their retail outlet/website visits per month , purchase amount per month , purchase frequency per month etc. It is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets. Once the algorithm has been run and the groups are defined, any new data can be easily assigned to the correct group
  • 8. Step 1: Begin with a decision on the value of k : Number of clusters (groups) and input variables. Use silhouette score to determine k. Step 2: Scale the data using [(x- min(x)/max(x)-min(x)] and initialize cluster centers. Randomly select k observations from the scaled data and consider them as initial cluster centers. Step 3: Calculate euclidean distance between an observation and initial cluster centers. • Based on euclidean distance, each observation is assigned to one of the clusters - based on minimum distance. Step 4: Move onto next observation , calculate euclidean distance, update cluster centers and assign this observation a cluster membership based on minimum distance same as step 3. Step 5: Repeat step 4 until all observations are assigned a cluster membership. Step 6 : Check cluster plot and silhouette score to measure the goodness of clusters generated. How it works – Steps
  • 9. Height Weight 185 72 170 56 168 60 179 68 182 72 188 77 180 71 180 70 183 84 180 88 180 67 177 76 Data sample : Cluster Initial cluster centers Height Weight K1 185 72 K2 170 56 Step 1: Input • Scaled variables and Number of Clusters (k) • In this example, only two variables –height and weight – are considered for clustering • Let’s consider number of clusters =2 Step 2: Initialize cluster centers • Let’s initialize cluster centers with first two observations Step 3: Calculate Euclidean distance • Euclidean distance between an observation and initial cluster centers 1 and 2 is calculated. • Based on Euclidean distance, each observation is assigned to one of the clusters - based on minimum distance Example
  • 10. Height Weight 185 72 170 56 First two observations Cluster Height Weight K1 185 72 K2 170 56 Updated centers Euclidian Distance from Cluster 1 Euclidian Distance from Cluster 2 Cluster Assignment SQRT [(185-185)2+(72-72)2 ] =0 SQRT [(185-170)2+(72-56)2] = 21.93 1 SQRT [(170-185)2+(56-72)2] = 21.93 SQRT [(170-170)2+(56-56)2] = 0 2 Euclidean Distance from each of the clusters is calculated: Step 3: Continue… There is no change in centers as we considered same two observations as initial centers Example
  • 11. Height Weight 168 60 Next observation Cluster Height Weight K1 185 72 K2 (170 +168)/2 =169 (56 +60)/2= 58 Updated cluster centers Euclidian Distance from Cluster 1 Euclidian Distance from Cluster 2 Cluster Assignment SQRT [(168-185) 2+(60-72) 2] =20.808 SQRT[((168-170)2+(60-56) 2] = 4.472 2 Step 4 : Move onto next observation, calculate euclidean distance, assign cluster membership and update cluster centers • Since distance is minimum from cluster 2, the observation is assigned to cluster 2. • Now revise Cluster centers – Mean value of observations’ Height and Weight. • Addition is only to cluster 2, so centroid of cluster 2 will be updated as follows : Example
  • 12. Height Weight 179 68 Next observation Cluster Height Weight K1 (185 +179)/2 =182 (72 +68)/2 =70 K2 169 58 Updated cluster centers Euclidian Distance from Cluster 1 Euclidian Distance from Cluster 2 Cluster Assignment SQRT [(179-185) 2+(68-72) 2] =7.21 SQRT[((179-170)2+(68-56) 2] = 14.14 1 Step 5: Repeat steps 4 : calculate Euclidean distance for next observation, assign next observation based on minimum distance & update the cluster centers until all observations are assigned a cluster membership o Since distance is minimum from cluster 1, the observation is assigned to cluster 1. o Now revise Cluster Centroid – Mean value of observations’ Height and Weight. o Addition is only to cluster 1, so centroid of cluster 1 will be updated as follows : Example
  • 13. Step 6 : Draw cluster plot to see how clusters are distributed. Lesser the overlap between clusters , better the distribution and cluster assignments. Cluster Updated Centroid Height Weight K=1 182.8 72 K=2 169 58 Final assignments Final cluster centers Cluster plot silhouette = 0.8 Indicating very good quality clusters Closer the silhouette score to 1 , better the quality of clusters Example
  • 15. Standard Tuning Parameter oNumber of clusters (K) : • The desired number of clusters • Suggested range : 3 to 5 • The actual number could be smaller in the output if there are no divisible clusters in the data • This parameter input can be automated using silhouette score(explained in later slides). Max Iterations: • The max number of k-means iterations to split clusters • By default this value should be set to 20
  • 16. Sample UI For Input Variables & Parameters Selection & Output
  • 17. Sample UI for selecting predictors and applying tuning parameters: For Two Predictors Select the variables you would like to use as predictors to build clusters Height Weight BMI 21 Tuning parameters Number of clusters Maximum iterations  The silhouette score is another useful criterion for assessing the natural and optimal number of clusters as well as for checking overall quality of partition  The largest silhouette score, over different K, indicates the best number of clusters
  • 18. Height(cm) Weight(Kg) BMI Cluster Number 158 60 23 1 160 65 25 2 170 70 26 2 149 50 21 1 180 80 27 3 165 80 28 3 200 90 23 1 Each customer is assigned a cluster membership as shown in the table in left Height Weight silhouette = 0.7 Indicating good quality clusters Output UI: For Two Predictors As clusters are built using only 2 predictors here , scatter plot axis will reflect actual predictors i.e. Height and Weight instead of principle components . Again, lesser the overlap in cluster outlines , better the clusters’ assignment Alternatively silhouette score can be checked to evaluate how clusters are partitioned - Closer this value to 1 , better the partition quality.
  • 19. Sample UI for Selecting Predictors And Applying Tuning Parameters: For Four Predictors Select the variables you would like to use as predictors to build clusters Purchase amount Purchase frequency Total purchase quantity Annual income Website visits 21 Tuning parameters Number of clusters Maximum iterations  The silhouette score is another useful criterion for assessing the natural and optimal number of clusters as well as for checking overall quality of partition  The largest silhouette score, over different K, indicates the best number of clusters
  • 20. Customer ID Purchase amount(per month) Purchase frequency (per month) Total purchase quantity Annual Income slab Website visits (per month) Cluster Number 1 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2 2 1k to 5k 1 4 1 lac to 2 lac <3 1 3 10 to 15k 3 8 4 lac to 6 lac 6 to 10 3 4 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2 5 10 to 15k 3 8 4 lac to 6 lac 6 to 10 3 6 1k to 5k 1 2 1 lac to 2 lac <3 1 7 5k to 10k 2 6 2 lac to 4 lac 3 to 6 2 8 1k to 5k 1 2 1 lac to 2 lac <3 1 9 1k to 5k 1 2 1 lac to 2 lac <3 1 10 >20k 4 32 10 lac to 15 lac >10 3 Each customer is assigned a cluster membership as shown in the table in left First principal component Secondprincipalcomponent Output UI: For Four Predictors In the 2D cluster plot shown in right , clusters distribution is plotted. In this case, axis will reflect first two principle components (check definition below ) instead of actual predictors as number of predictors is >2. In case of 3D plot , first three principal components will be shown. Lesser the overlap between clusters , better the clusters’ assignment. Also silhouette score can be checked to evaluate how clusters are partitioned - Closer this value to 1 , better the partition quality. Whenever there are more than 2 predictors as input , axis will reflect principle components instead of actual predictors Principle components are linear combination of original predictors which captures the maximum variance in data set. Most of the variance in data is explained by first three principle components so we can ignore remaining components.
  • 22. silhouette = -0.5 Indicating poor quality clusters silhouette = 0.7 Indicating good quality clusters  Clusters with silhouette score closer to 1 are more desirable. So this index can be used to measure the goodness of clusters. silhouette = 0.3 Average quality clusters Other Sample Output Formats
  • 24. Limitations • The number of clusters, k, must be determined before hand. Instead the algorithm should auto suggest this number for better user friendliness. • It does not yield the same result with each run, since the resulting clusters depend on the initial random assignments for group centers. • If it is inputted in a different order it may produce different cluster if the number of data points are few, hence number of data points must be large enough. • It has been suggested that 2m can be used (where m = number of clustering variables) as a rule to decide sample data size. • K-means is suitable only for numeric data. • Scale of data points influences Euclidean distance , so variable standardization becomes necessary. • Empty clusters can be obtained if no points are allocated to a cluster during the assignment step.
  • 26. General applications of K-means Clustering • Some examples of use cases are: • Behavioral segmentation: • Segment customers by purchase history • Segment users by activities on application, website, or platform • Define personas based on interests • Create consumer profiles based on activity monitoring • Some other general applications : • Pattern recognitions, Market segmentation, Classification analysis, Artificial intelligence, Image processing , astronomy , agriculture and many others.
  • 27. Use case 1 • Business problem : • Grouping loan applicants into high/medium/low risk applicants based on attributes such as Loan amount , Monthly installment, Employment tenure , Times delinquent, Annual income, Debt to income ratio etc. • Business benefit: • Once segments are identified , bank will have a loan applicants’ dataset with each applicant labeled as high/medium/low risk. • Based on this labels , bank can easily make a decision on whether to give loan to an applicant or not and if yes then how much credit limit and interest rate each applicant is eligible for based on the amount of risk involved.
  • 28. Use case 1 Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure 1039153 21000 701.73 105000 9 5 4 1069697 15000 483.38 92000 11 5 2 1068120 25600 824.96 110000 10 9 2 563175 23000 534.94 80000 9 2 12 562842 19750 483.65 57228 11 3 21 562681 25000 571.78 113000 10 0 9 562404 21250 471.2 31008 12 1 12 700159 14400 448.99 82000 20 6 6 696484 10000 241.33 45000 18 8 2 702598 11700 381.61 45192 20 7 3 702470 10000 243.29 38000 17 9 7 702373 4800 144.77 54000 19 8 2 701975 12500 455.81 43560 15 8 4 Input dataset :
  • 29. Use case 1 Output : Each record will have the cluster (segment) assignment as shown below : Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Cluster 1039153 21000 701.73 105000 9 5 4 Medium 1069697 15000 483.38 92000 11 5 2 Medium 1068120 25600 824.96 110000 10 9 2 Medium 563175 23000 534.94 80000 9 2 12 Low 562842 19750 483.65 57228 11 3 21 Low 562681 25000 571.78 113000 10 0 9 Low 562404 21250 471.2 31008 12 1 12 Low 700159 14400 448.99 82000 20 6 6 High 696484 10000 241.33 45000 18 8 2 High 702598 11700 381.61 45192 20 7 3 High 702470 10000 243.29 38000 17 9 7 High 702373 4800 144.77 54000 19 8 2 High 701975 12500 455.81 43560 15 8 4 High
  • 30. Use case 1 Output : Cluster profiles : As can be seen in the table above, there are distinctive characteristics of high /medium and low risk segments High risk segment has high likelihood to be delinquent, highest debt to income ratio and lowest employment tenure as compared to other two segments Whereas low risk segment exhibits exactly the opposite pattern i.e. lowest debt to income ratio, lowest delinquency and highest employment tenure as compared to other two segments Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it comes to segmenting loan applicants Cluster Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Risk Segment 1 10447.30 304.87 66467.74 9.58 1.69 16.82 Low 2 21391.58 598.54 94912.59 12.37 5.98 4.58 Medium 3 7521.32 227.43 60935.28 16.55 6.91 4.01 High
  • 31. Use case 1 Output : Cluster distribution: In the cluster distribution plot , there is negligible overlap in cluster outlines so we can say that cluster assignments is good in our case Clusters with silhouette width average closer to 1 are more desirable. So this index can be used to test the quality of clusters’ distribution. silhouette = 0.6 Indicating good quality clusters
  • 32. Use case 2 Business benefit: • Once segments are identified , marketing messages and even products can be customized for each segment. • The better the segment(s) chosen for targeting by a particular organization , the more successful it is assumed to be in the market place. Business problem : • Organizing customers into groups/segments based on similar traits, product preferences and expectations • Segments are constructed on basis of the customers’ demographic characteristics, psychographics, past behavior and product use behaviors
  • 33. Use case 3 Business benefit: • Business marketing team can focus on risky customer segments in efficient way in order to avert them from churning/leaving • Sales team segments which are facing challenges based on current discounting strategy can be identified and deal negotiation strategy can be improved /optimized for them. Business problem : • Discount Analysis and Customer Retention – Visualize ‘segments of sales group based on discount behavior’ and ‘customer churn - segments of customers on verge of leaving’
  • 34. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018