SlideShare a Scribd company logo
1 of 27
KSI Microsoft AEP mentorrbuddy.co
m
K-MEANS
CLUSTERING
KSI Microsoft AEP mentorrbuddy.co
m
CLUSTERING
• Clustering falls into the category of un-
supervised learning.
• Clustering is the task of dividing the
population or data points into a number
of groups such that data points in the
same groups are more similar to other
data points in the same group and
dissimilar to the data points in other
groups.
KSI Microsoft AEP mentorrbuddy.co
m
WHAT K-MEANS DOES FOR YOU ?
𝑥1
𝑥2
Before K-
Means
K-
Mean
s
𝑥1
𝑥2
After K-Means
KSI Microsoft AEP mentorrbuddy.co
m
HOW DOES IT WORKS?
STEP 1: Choose the number K clusters.
STEP 2: Select at random K points, the centroids(not necessarily from your dataset).
STEP 3: Assign each data point to the closest centroid – That forms K clusters.
STEP 4: Compute and place the new centroid of each cluster.
STEP 5: Reassign each data point to the new closet centroid.
If reassignment took place, go to step 4, otherwise go to FIN.(Model Created)
KSI Microsoft AEP mentorrbuddy.co
m
STEP 1: CHOOSE THE NUMBER K OF CLUSTERS: K=2
KSI Microsoft AEP mentorrbuddy.co
m
STEP 2: SELECT AT RANDOM K POINTS, THE CENTROID
(NOT NECESSARILY FROM YOUR DATASET)
KSI Microsoft AEP mentorrbuddy.co
m
STEP 3: ASSIGN EACH DATA POINT TO THE CLOSEST
CENTROID – THAT FORMS K CLUSTERS
KSI Microsoft AEP mentorrbuddy.co
m
STEP 4: COMPUTE AND PLACE THE NEW CENTROID
OF EACH CLUSTER
KSI Microsoft AEP mentorrbuddy.co
m
STEP 5: REASSIGN EACH DATA POINT TO THE NEW
CLOSEST CENTROID. IF AN REASSIGNMENT TOOK PLACE
GO TO STEP 4, OTHERWISE GO TO FIN.
KSI Microsoft AEP mentorrbuddy.co
m
FINISHED MODEL
KSI Microsoft AEP mentorrbuddy.co
m
RANDOM INITIALIZATION TRAP
𝑥1
𝑥2
𝑥1
KSI Microsoft AEP mentorrbuddy.co
m
RANDOM INITIALIZATION TRAP
But what would happen if we choose a bad random initialization?
KSI Microsoft AEP mentorrbuddy.co
m
RANDOM INITIALIZATION TRAP
𝑥1
𝑥2
𝑥1
KSI Microsoft AEP mentorrbuddy.co
m
AVOIDING THE TRAP
Solution K-
Means++
KSI Microsoft AEP mentorrbuddy.co
m
WITHIN CLUSTERS SUM OF SQUARES
WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2
𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
KSI Microsoft AEP mentorrbuddy.co
m
CONTD..
𝑥1
𝑥2
𝑥1
WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2
𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
KSI Microsoft AEP mentorrbuddy.co
m
𝑥1
𝑥2
𝑥1
WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2
𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1
KSI Microsoft AEP mentorrbuddy.co
m
𝑥1
𝑥2
𝑥1
WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2
𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2
KSI Microsoft AEP mentorrbuddy.co
m
𝑥1
𝑥2
𝑥1
WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2
𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
KSI Microsoft AEP mentorrbuddy.co
m
CHOOSING THE RIGHT NUMBER OF CLUSTERS
𝑥1
𝑥2
𝑥1
𝑥2
𝑥1
KSI Microsoft AEP mentorrbuddy.co
m
THE ELBOW METHOD (1/2)
• It is a graph of distortion score (WCSS) vs k
• Distortion can be measured by WCSS as discussed in the
previous slides.
• Depending on the distortion appropriate number of
clusters k is chosen.
KSI Microsoft AEP mentorrbuddy.co
m
THE ELBOW METHOD (2/2)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 2 3 4 5 6 7 8 9 10
Series2
KSI Microsoft AEP mentorrbuddy.co
m
ASSUMPTIONS
• If any one of these 3 assumptions is violated, then k-means will fail:
• k-means assume the variance of the distribution of each attribute (variable) is
spherical;
• All variables have the same variance;
• the prior probability for all k clusters are the same, i.e. each cluster has roughly equal
number of observations
• Clusters in K-means are defined by taking the mean of all the data points in the
cluster.
KSI Microsoft AEP mentorrbuddy.co
m
ADVANTAGES
• Relatively simple to implement.
• Scales to large data sets.
• Guarantees convergence.
• Can warm-start the positions of centroids.
• Easily adapts to new examples.
• Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
KSI Microsoft AEP mentorrbuddy.co
m
DISADVANTAGES
• The user has to specify 𝑘 (the number of clusters) in the beginning.
• k-means can only handle numerical data.
• Being dependent on initial values.
• Clustering data of varying sizes and density.
• Clustering outliers.
• Scaling with number of dimensions.
KSI Microsoft AEP mentorrbuddy.co
m
APPLICATIONS
• Marketing : It can be used to characterize &
discover customer segments for marketing
purposes.
• Biology : It can be used for classification among
different species of plants and animals.
• Libraries : It is used in clustering different books
on the basis of topics and information.
• Insurance : It is used to acknowledge the
customers, their policies and identifying the
frauds.
KSI Microsoft AEP mentorrbuddy.co
m
THANK YOU

More Related Content

Similar to k-Means Clustering.pptx

ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)
Sri Prasanna
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
kunwarpratap8055
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 

Similar to k-Means Clustering.pptx (20)

Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
DBSCAN (1) (4).pptx
DBSCAN (1) (4).pptxDBSCAN (1) (4).pptx
DBSCAN (1) (4).pptx
 
07 learning
07 learning07 learning
07 learning
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Clustering (from Google)
Clustering (from Google)Clustering (from Google)
Clustering (from Google)
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

k-Means Clustering.pptx

  • 1. KSI Microsoft AEP mentorrbuddy.co m K-MEANS CLUSTERING
  • 2. KSI Microsoft AEP mentorrbuddy.co m CLUSTERING • Clustering falls into the category of un- supervised learning. • Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.
  • 3. KSI Microsoft AEP mentorrbuddy.co m WHAT K-MEANS DOES FOR YOU ? 𝑥1 𝑥2 Before K- Means K- Mean s 𝑥1 𝑥2 After K-Means
  • 4. KSI Microsoft AEP mentorrbuddy.co m HOW DOES IT WORKS? STEP 1: Choose the number K clusters. STEP 2: Select at random K points, the centroids(not necessarily from your dataset). STEP 3: Assign each data point to the closest centroid – That forms K clusters. STEP 4: Compute and place the new centroid of each cluster. STEP 5: Reassign each data point to the new closet centroid. If reassignment took place, go to step 4, otherwise go to FIN.(Model Created)
  • 5. KSI Microsoft AEP mentorrbuddy.co m STEP 1: CHOOSE THE NUMBER K OF CLUSTERS: K=2
  • 6. KSI Microsoft AEP mentorrbuddy.co m STEP 2: SELECT AT RANDOM K POINTS, THE CENTROID (NOT NECESSARILY FROM YOUR DATASET)
  • 7. KSI Microsoft AEP mentorrbuddy.co m STEP 3: ASSIGN EACH DATA POINT TO THE CLOSEST CENTROID – THAT FORMS K CLUSTERS
  • 8. KSI Microsoft AEP mentorrbuddy.co m STEP 4: COMPUTE AND PLACE THE NEW CENTROID OF EACH CLUSTER
  • 9. KSI Microsoft AEP mentorrbuddy.co m STEP 5: REASSIGN EACH DATA POINT TO THE NEW CLOSEST CENTROID. IF AN REASSIGNMENT TOOK PLACE GO TO STEP 4, OTHERWISE GO TO FIN.
  • 10. KSI Microsoft AEP mentorrbuddy.co m FINISHED MODEL
  • 11. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP 𝑥1 𝑥2 𝑥1
  • 12. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP But what would happen if we choose a bad random initialization?
  • 13. KSI Microsoft AEP mentorrbuddy.co m RANDOM INITIALIZATION TRAP 𝑥1 𝑥2 𝑥1
  • 14. KSI Microsoft AEP mentorrbuddy.co m AVOIDING THE TRAP Solution K- Means++
  • 15. KSI Microsoft AEP mentorrbuddy.co m WITHIN CLUSTERS SUM OF SQUARES WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
  • 16. KSI Microsoft AEP mentorrbuddy.co m CONTD.. 𝑥1 𝑥2 𝑥1 WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
  • 17. KSI Microsoft AEP mentorrbuddy.co m 𝑥1 𝑥2 𝑥1 WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1
  • 18. KSI Microsoft AEP mentorrbuddy.co m 𝑥1 𝑥2 𝑥1 WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2
  • 19. KSI Microsoft AEP mentorrbuddy.co m 𝑥1 𝑥2 𝑥1 WCSS= 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶1)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶2)2 + 𝛴 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑃𝑖, 𝐶3)2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 1 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 2 𝑃𝑖 𝑖𝑛 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 3
  • 20. KSI Microsoft AEP mentorrbuddy.co m CHOOSING THE RIGHT NUMBER OF CLUSTERS 𝑥1 𝑥2 𝑥1 𝑥2 𝑥1
  • 21. KSI Microsoft AEP mentorrbuddy.co m THE ELBOW METHOD (1/2) • It is a graph of distortion score (WCSS) vs k • Distortion can be measured by WCSS as discussed in the previous slides. • Depending on the distortion appropriate number of clusters k is chosen.
  • 22. KSI Microsoft AEP mentorrbuddy.co m THE ELBOW METHOD (2/2) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 2 3 4 5 6 7 8 9 10 Series2
  • 23. KSI Microsoft AEP mentorrbuddy.co m ASSUMPTIONS • If any one of these 3 assumptions is violated, then k-means will fail: • k-means assume the variance of the distribution of each attribute (variable) is spherical; • All variables have the same variance; • the prior probability for all k clusters are the same, i.e. each cluster has roughly equal number of observations • Clusters in K-means are defined by taking the mean of all the data points in the cluster.
  • 24. KSI Microsoft AEP mentorrbuddy.co m ADVANTAGES • Relatively simple to implement. • Scales to large data sets. • Guarantees convergence. • Can warm-start the positions of centroids. • Easily adapts to new examples. • Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
  • 25. KSI Microsoft AEP mentorrbuddy.co m DISADVANTAGES • The user has to specify 𝑘 (the number of clusters) in the beginning. • k-means can only handle numerical data. • Being dependent on initial values. • Clustering data of varying sizes and density. • Clustering outliers. • Scaling with number of dimensions.
  • 26. KSI Microsoft AEP mentorrbuddy.co m APPLICATIONS • Marketing : It can be used to characterize & discover customer segments for marketing purposes. • Biology : It can be used for classification among different species of plants and animals. • Libraries : It is used in clustering different books on the basis of topics and information. • Insurance : It is used to acknowledge the customers, their policies and identifying the frauds.
  • 27. KSI Microsoft AEP mentorrbuddy.co m THANK YOU