SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Clustering in Data mining
By
S.Archana
Synopsis
• Introduction
• Clustering
• Why Clustering?
• Several working definitions of clustering
• Methods of clustering
• Applications of clustering
Introduction
• Defined as extracting the information from the
huge set of data.
• Extracting set of patterns from the data set.
Clustering
• Clustering means grouping the objects based on the
information found in the data describing the objects or their
relationships.
• The goal is that the objects in a group will be similar (or
related) to one other and different from (or unrelated to) the
objects in other groups.
• grouped according to logical relationships or consumer
preferences.
• unsupervised learning no target field,
• bottom-up approach.
• originated in anthropology by Driver and Kroeber in 1932 and
introduced to psychology by Zubin in 1938 and Robert
tryon in 1939 and famously used by Cattell beginning in 1943
for trait theory classification in personality psychology.
Examples of Clustering
Why clustering?
• High dimensionality - The clustering algorithm
should not only be able to handle low- dimensional
data but also the high dimensional space.
• Ability to deal with noisy data - Databases contain
noisy, missing or erroneous data. Some algorithms
are sensitive to such data and may lead to poor
quality clusters.
• Interpretability - The clustering results should be
interpretable, comprehensible and usable.
Cont…
• Scalability - We need highly scalable clustering
algorithms to deal with large databases.
• Ability to deal with different kind of attributes -
Algorithms should be capable to be applied on any
kind of data such as interval based (numerical) data,
categorical, binary data.
• Discovery of clusters with attribute shape - The
clustering algorithm should be capable of detect
cluster of arbitrary shape. It should not be bounded to
only distance measures that tend to find spherical
cluster of small size.
Several working definitions of
clustering
• Well separated cluster definition
• Centre based cluster definition
• Contiguous cluster definition(Nearest
neighbour or transitive clustering)
• Similarity based cluster definition
• Density based cluster definition
Methods in Clustering
• Partitioning Method
• Hierarchical Method
• Density-based Method
• Grid-Based Method
• Model-Based Method
• Constraint-based Method
Partitioning Method
• Suppose we are given a database of n objects, the
partitioning method construct k partition of data.
Each partition will represents a cluster and k≤n. It
means that it will classify the data into k groups,
– Each group contain at least one object.
– Each object must belong to exactly one group.
• For a given number of partitions (say k), the
partitioning method will create an initial partitioning.
• Then it uses the iterative relocation technique to
improve the partitioning by moving objects from one
group to other.
Hierarchical Method
• 2 types of HM’s
– Agglomerative Approach
– Divisive Approach
• Agglomerative Approach
– bottom-up approach.
– we start with each object forming a separate group. It keeps on merging
the objects or groups that are close to one another. It keep on doing so
until all of the groups are merged into one or until the termination
condition holds.
• Divisive Approach
– top-down approach.
– we start with all of the objects in the same cluster. In the continuous
iteration, a cluster is split up into smaller clusters. It is down until each
object in one cluster or the termination condition holds.
• DisadvantageThis method is rigid i.e. once merge or split is done, It can
never be undone.
Density based Method
• This method is based on the notion of
density. The basic idea is to continue
growing the given cluster as long as the
density in the neighbourhood exceeds
some threshold i.e. for each data point
within a given cluster, the radius of a
given cluster has to contain at least a
minimum number of points.
Grid based Method
• In this the objects together from a grid. The
object space is quantized into finite number of
cells that form a grid structure.
• AdvantageThe major advantage of this method
is fast processing time.
• It is dependent only on the number of cells in
each dimension in the quantized space.
Model based Method
• In this method a model is hypothesize for each
cluster and find the best fit of data to the given
model. This method locate the clusters by
clustering the density function. This reflects
spatial distribution of the data points.
• This method also serve a way of automatically
determining number of clusters based on standard
statistics , taking outlier or noise into account. It
therefore yield robust clustering methods.
Constraint based Method
• In this method the clustering is performed by
incorporation of user or application oriented
constraints. The constraint refers to the user
expectation or the properties of desired
clustering results. The constraint give us the
interactive way of communication with the
clustering process. The constraint can be
specified by the user or the application
requirement.
Applications
• Pattern Recognition
• Spatial Data Analysis:
• Image Processing
• Economic Science (especially market research)
• Crime analysis
• Bio informatics
• Medical Imaging
• Robotics
• Climatology
Thank you!!

Weitere ähnliche Inhalte

Was ist angesagt?

Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series dataKrish_ver2
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMvikas dhakane
 

Was ist angesagt? (20)

2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Clustering
ClusteringClustering
Clustering
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Feature selection
Feature selectionFeature selection
Feature selection
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 

Andere mochten auch

Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
12 งานนำสนอ cluster analysis
12 งานนำสนอ cluster analysis12 งานนำสนอ cluster analysis
12 งานนำสนอ cluster analysiskhuwawa2513
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysissaba khan
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 

Andere mochten auch (14)

Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
12 งานนำสนอ cluster analysis
12 งานนำสนอ cluster analysis12 งานนำสนอ cluster analysis
12 งานนำสนอ cluster analysis
 
cluster analysis
cluster analysis cluster analysis
cluster analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Clustering training
Clustering trainingClustering training
Clustering training
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Data mining
Data miningData mining
Data mining
 

Ähnlich wie Clustering in Data Mining

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDatamining Tools
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptxagniva pradhan
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxagniva pradhan
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.pptssuser6b3336
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptxssuser2e437f
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptxNANDHINIS900805
 

Ähnlich wie Clustering in Data Mining (20)

Data mining
Data miningData mining
Data mining
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Data mining presentation
Data mining presentationData mining presentation
Data mining presentation
 
Clustering
ClusteringClustering
Clustering
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx1. METHODS OF CLUSTER ANALYSIS.pptx
1. METHODS OF CLUSTER ANALYSIS.pptx
 
METHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptxMETHODS OF CLUSTER ANALYSIS.pptx
METHODS OF CLUSTER ANALYSIS.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
clustering and distance metrics.pptx
clustering and distance metrics.pptxclustering and distance metrics.pptx
clustering and distance metrics.pptx
 
Unit 5-1.pdf
Unit 5-1.pdfUnit 5-1.pdf
Unit 5-1.pdf
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
 

Kürzlich hochgeladen

An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 

Kürzlich hochgeladen (20)

An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 

Clustering in Data Mining

  • 1. Clustering in Data mining By S.Archana
  • 2. Synopsis • Introduction • Clustering • Why Clustering? • Several working definitions of clustering • Methods of clustering • Applications of clustering
  • 3. Introduction • Defined as extracting the information from the huge set of data. • Extracting set of patterns from the data set.
  • 4. Clustering • Clustering means grouping the objects based on the information found in the data describing the objects or their relationships. • The goal is that the objects in a group will be similar (or related) to one other and different from (or unrelated to) the objects in other groups. • grouped according to logical relationships or consumer preferences. • unsupervised learning no target field, • bottom-up approach. • originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert tryon in 1939 and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.
  • 6. Why clustering? • High dimensionality - The clustering algorithm should not only be able to handle low- dimensional data but also the high dimensional space. • Ability to deal with noisy data - Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters. • Interpretability - The clustering results should be interpretable, comprehensible and usable.
  • 7. Cont… • Scalability - We need highly scalable clustering algorithms to deal with large databases. • Ability to deal with different kind of attributes - Algorithms should be capable to be applied on any kind of data such as interval based (numerical) data, categorical, binary data. • Discovery of clusters with attribute shape - The clustering algorithm should be capable of detect cluster of arbitrary shape. It should not be bounded to only distance measures that tend to find spherical cluster of small size.
  • 8. Several working definitions of clustering • Well separated cluster definition • Centre based cluster definition • Contiguous cluster definition(Nearest neighbour or transitive clustering) • Similarity based cluster definition • Density based cluster definition
  • 9. Methods in Clustering • Partitioning Method • Hierarchical Method • Density-based Method • Grid-Based Method • Model-Based Method • Constraint-based Method
  • 10. Partitioning Method • Suppose we are given a database of n objects, the partitioning method construct k partition of data. Each partition will represents a cluster and k≤n. It means that it will classify the data into k groups, – Each group contain at least one object. – Each object must belong to exactly one group. • For a given number of partitions (say k), the partitioning method will create an initial partitioning. • Then it uses the iterative relocation technique to improve the partitioning by moving objects from one group to other.
  • 11. Hierarchical Method • 2 types of HM’s – Agglomerative Approach – Divisive Approach • Agglomerative Approach – bottom-up approach. – we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds. • Divisive Approach – top-down approach. – we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. • DisadvantageThis method is rigid i.e. once merge or split is done, It can never be undone.
  • 12. Density based Method • This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighbourhood exceeds some threshold i.e. for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points.
  • 13. Grid based Method • In this the objects together from a grid. The object space is quantized into finite number of cells that form a grid structure. • AdvantageThe major advantage of this method is fast processing time. • It is dependent only on the number of cells in each dimension in the quantized space.
  • 14. Model based Method • In this method a model is hypothesize for each cluster and find the best fit of data to the given model. This method locate the clusters by clustering the density function. This reflects spatial distribution of the data points. • This method also serve a way of automatically determining number of clusters based on standard statistics , taking outlier or noise into account. It therefore yield robust clustering methods.
  • 15. Constraint based Method • In this method the clustering is performed by incorporation of user or application oriented constraints. The constraint refers to the user expectation or the properties of desired clustering results. The constraint give us the interactive way of communication with the clustering process. The constraint can be specified by the user or the application requirement.
  • 16. Applications • Pattern Recognition • Spatial Data Analysis: • Image Processing • Economic Science (especially market research) • Crime analysis • Bio informatics • Medical Imaging • Robotics • Climatology