SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Clustering
Hierarchical Methods
1
2
Hierarchical Clustering
 Use distance matrix as clustering criteria.
 This method does not require the number of clusters k as an input,
but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
a
a b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
agglomerative
(AGNES)
divisive
(DIANA)
3
AGNES (Agglomerative Nesting)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages
 Use the Single-Link method and the dissimilarity matrix.
 Merge nodes that have the least dissimilarity
 Go on in a non-descending fashion
 Eventually all nodes belong to the same cluster
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
4
Dendrogram
5
DIANA (Divisive Analysis)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages, e.g., Splus
 Inverse order of AGNES
 Eventually each node forms a cluster on its own
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
6
Distance Measures
 Minimum distance dmin(Ci, Cj) = min p ∈Ci,p’ ∈Cj |p-p’|
 Nearest Neighbor algorithm
 Single-linkage algorithm – terminates when distance > threshold
 Agglomerative Hierarchical – Minimal Spanning tree
 Maximum distance dmax(Ci, Cj) = max p ∈Ci,p’ ∈Cj |p-p’|
 Farthest-neighbor clustering algorithm
 Complete-linkage - terminates when distance > threshold
 Mean distance dmean(Ci, Cj) = |mi-mj|
 Average distance davg(Ci, Cj) = (1/ninj) ∑ p ∈Ci ∑,p’ ∈Cj |p-p’|
7
Difficulties faced in Hierarchical
Clustering
 Selection of merge/split points
 Cannot revert operation
 Scalability
8
Recent Hierarchical Clustering Methods
 Integration of hierarchical and other techniques:
 BIRCH: uses tree structures and incrementally adjusts the quality of
sub-clusters
 CURE: Represents a Cluster by a fixed number of representative
objects
 ROCK: clustering categorical data by neighbor and link analysis
 CHAMELEON : hierarchical clustering using dynamic modeling
9
BIRCH
 Balanced Iterative Reducing and Clustering using Hierarchies
 Incrementally construct a CF (Clustering Feature) tree, a hierarchical data
structure for multiphase clustering
 Phase 1: scan DB to build an initial in-memory CF tree (a multi-level
compression of the data that tries to preserve the inherent clustering structure of
the data)
 Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the
CF-tree
 Scales linearly: finds a good clustering with a single scan and improves the
quality with a few additional scans
 Weakness: handles only numeric data, and sensitive to the order of the
data record.
10
Clustering Feature Vector in BIRCH
Clustering Feature: CF = (N, LS, SS)
N: Number of data points
LS: ∑N
i=1=Xi
SS: ∑N
i=1=Xi
2
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
CF = (5, (16,30),(54,190))
(3,4)
(2,6)
(4,5)
(4,7)
(3,8)
11
CF-Tree in BIRCH
 Clustering feature:
 summary of the statistics for a given subcluster
 registers crucial measurements for computing cluster and utilizes storage
efficiently
A CF tree is a height-balanced tree that stores the clustering features
for a hierarchical clustering
 A nonleaf node in a tree has descendants or “children”
 The nonleaf nodes store sums of the CFs of their children
 A CF tree has two parameters
 Branching factor: specify the maximum number of children.
 threshold: max diameter of sub-clusters stored at the leaf nodes
12
CF Tree
CF1
child1
CF3
child3
CF2
child2
CF6
child6
CF1
child1
CF3
child3
CF2
child2
CF5
child5
CF1 CF2 CF6prev next CF1 CF2 CF4prev next
B = 7
L = 6
Root
Non-leaf node
Leaf node Leaf node
13
CURE (Clustering Using Representatives)
 Supports Non-Spherical Clusters
 Fixed number of representative points are used
 Select scattered objects and shrink by moving towards cluster
center
 Merge clusters with closest pair of representatives
 For large Databases
 Draw a random Sample S
 Partition
 Partially cluster each partition
 Eliminate outliers and slow growing clusters
 Continue Clustering
 Label Cluster by Representatives
14
Cure: Shrinking Representative Points
 Shrink the multiple representative points towards the gravity center by a
fraction of α.
 Multiple representatives capture the shape of the cluster
x
y
x
y
15
Clustering Categorical Data: ROCK
Algorithm
 ROCK: RObust Clustering using linKs
 Major ideas
 Use links to measure similarity/proximity
 Not distance-based
 Number of Common Neighbours
 Algorithm: sampling-based clustering
 Draw random sample
 Cluster with links
 Label data in disk
Sim T T
T T
T T
( , )1 2
1 2
1 2
=
∩
∪
16
CHAMELEON: Hierarchical Clustering Using
Dynamic Modeling
 Measures the similarity based on a dynamic model
 Two clusters are merged only if the interconnectivity and closeness
(proximity) between two clusters are high
 A two-phase algorithm
1. Use a graph partitioning algorithm: cluster objects into a large number of
relatively small sub-clusters
2. Use an agglomerative hierarchical clustering algorithm: find the genuine
clusters by repeatedly combining these sub-clusters
17
Overall Framework of CHAMELEON
Construct
Sparse Graph Partition the Graph
Merge Partition
Final Clusters
Data Set
Chameleon
 Computes similarity based on Relative Interconnectivity
and Relative Closeness
18

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series dataKrish_ver2
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 

Was ist angesagt? (20)

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Feature selection
Feature selectionFeature selection
Feature selection
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 

Andere mochten auch

Machine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule MiningMachine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule MiningPier Luca Lanzi
 
28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical Clustering28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical ClusteringAndres Mendez-Vazquez
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clusteringguestfee8698
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...IEEEFINALYEARSTUDENTPROJECTS
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modelingpkchoudhury
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryLee Larcombe
 
1.9 b trees eg 03
1.9 b trees eg 031.9 b trees eg 03
1.9 b trees eg 03Krish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
trabajo de cultural
trabajo de culturaltrabajo de cultural
trabajo de culturalargelures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 

Andere mochten auch (20)

Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule MiningMachine Learning and Data Mining: 05 Advanced Association Rule Mining
Machine Learning and Data Mining: 05 Advanced Association Rule Mining
 
28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical Clustering28 Machine Learning Unsupervised Hierarchical Clustering
28 Machine Learning Unsupervised Hierarchical Clustering
 
K-means and Hierarchical Clustering
K-means and Hierarchical ClusteringK-means and Hierarchical Clustering
K-means and Hierarchical Clustering
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Converting management controllable resources into best in-class performance 2012
Converting management controllable resources into best in-class performance 2012Converting management controllable resources into best in-class performance 2012
Converting management controllable resources into best in-class performance 2012
 
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
IEEE 2014 JAVA DATA MINING PROJECTS A similarity measure for text classificat...
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
1.9 b trees eg 03
1.9 b trees eg 031.9 b trees eg 03
1.9 b trees eg 03
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
trabajo de cultural
trabajo de culturaltrabajo de cultural
trabajo de cultural
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
Online Trading Concepts
Online Trading ConceptsOnline Trading Concepts
Online Trading Concepts
 
RESUME-ARITRA BHOWMIK
RESUME-ARITRA BHOWMIKRESUME-ARITRA BHOWMIK
RESUME-ARITRA BHOWMIK
 

Ähnlich wie 3.3 hierarchical methods

multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10mqasimsheikh5
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster AnalysisSuman Mia
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
 

Ähnlich wie 3.3 hierarchical methods (20)

My8clst
My8clstMy8clst
My8clst
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Birch
BirchBirch
Birch
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 

Mehr von Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 
4.1 sequentioal search
4.1 sequentioal search4.1 sequentioal search
4.1 sequentioal searchKrish_ver2
 

Mehr von Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 
4.2 bst 03
4.2 bst 034.2 bst 03
4.2 bst 03
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
 
4.1 sequentioal search
4.1 sequentioal search4.1 sequentioal search
4.1 sequentioal search
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 

Kürzlich hochgeladen (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

3.3 hierarchical methods

  • 2. 2 Hierarchical Clustering  Use distance matrix as clustering criteria.  This method does not require the number of clusters k as an input, but needs a termination condition Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 agglomerative (AGNES) divisive (DIANA)
  • 3. 3 AGNES (Agglomerative Nesting)  Introduced in Kaufmann and Rousseeuw (1990)  Implemented in statistical analysis packages  Use the Single-Link method and the dissimilarity matrix.  Merge nodes that have the least dissimilarity  Go on in a non-descending fashion  Eventually all nodes belong to the same cluster 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 5. 5 DIANA (Divisive Analysis)  Introduced in Kaufmann and Rousseeuw (1990)  Implemented in statistical analysis packages, e.g., Splus  Inverse order of AGNES  Eventually each node forms a cluster on its own 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 6. 6 Distance Measures  Minimum distance dmin(Ci, Cj) = min p ∈Ci,p’ ∈Cj |p-p’|  Nearest Neighbor algorithm  Single-linkage algorithm – terminates when distance > threshold  Agglomerative Hierarchical – Minimal Spanning tree  Maximum distance dmax(Ci, Cj) = max p ∈Ci,p’ ∈Cj |p-p’|  Farthest-neighbor clustering algorithm  Complete-linkage - terminates when distance > threshold  Mean distance dmean(Ci, Cj) = |mi-mj|  Average distance davg(Ci, Cj) = (1/ninj) ∑ p ∈Ci ∑,p’ ∈Cj |p-p’|
  • 7. 7 Difficulties faced in Hierarchical Clustering  Selection of merge/split points  Cannot revert operation  Scalability
  • 8. 8 Recent Hierarchical Clustering Methods  Integration of hierarchical and other techniques:  BIRCH: uses tree structures and incrementally adjusts the quality of sub-clusters  CURE: Represents a Cluster by a fixed number of representative objects  ROCK: clustering categorical data by neighbor and link analysis  CHAMELEON : hierarchical clustering using dynamic modeling
  • 9. 9 BIRCH  Balanced Iterative Reducing and Clustering using Hierarchies  Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering  Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)  Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree  Scales linearly: finds a good clustering with a single scan and improves the quality with a few additional scans  Weakness: handles only numeric data, and sensitive to the order of the data record.
  • 10. 10 Clustering Feature Vector in BIRCH Clustering Feature: CF = (N, LS, SS) N: Number of data points LS: ∑N i=1=Xi SS: ∑N i=1=Xi 2 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CF = (5, (16,30),(54,190)) (3,4) (2,6) (4,5) (4,7) (3,8)
  • 11. 11 CF-Tree in BIRCH  Clustering feature:  summary of the statistics for a given subcluster  registers crucial measurements for computing cluster and utilizes storage efficiently A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering  A nonleaf node in a tree has descendants or “children”  The nonleaf nodes store sums of the CFs of their children  A CF tree has two parameters  Branching factor: specify the maximum number of children.  threshold: max diameter of sub-clusters stored at the leaf nodes
  • 12. 12 CF Tree CF1 child1 CF3 child3 CF2 child2 CF6 child6 CF1 child1 CF3 child3 CF2 child2 CF5 child5 CF1 CF2 CF6prev next CF1 CF2 CF4prev next B = 7 L = 6 Root Non-leaf node Leaf node Leaf node
  • 13. 13 CURE (Clustering Using Representatives)  Supports Non-Spherical Clusters  Fixed number of representative points are used  Select scattered objects and shrink by moving towards cluster center  Merge clusters with closest pair of representatives  For large Databases  Draw a random Sample S  Partition  Partially cluster each partition  Eliminate outliers and slow growing clusters  Continue Clustering  Label Cluster by Representatives
  • 14. 14 Cure: Shrinking Representative Points  Shrink the multiple representative points towards the gravity center by a fraction of α.  Multiple representatives capture the shape of the cluster x y x y
  • 15. 15 Clustering Categorical Data: ROCK Algorithm  ROCK: RObust Clustering using linKs  Major ideas  Use links to measure similarity/proximity  Not distance-based  Number of Common Neighbours  Algorithm: sampling-based clustering  Draw random sample  Cluster with links  Label data in disk Sim T T T T T T ( , )1 2 1 2 1 2 = ∩ ∪
  • 16. 16 CHAMELEON: Hierarchical Clustering Using Dynamic Modeling  Measures the similarity based on a dynamic model  Two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high  A two-phase algorithm 1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters 2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters
  • 17. 17 Overall Framework of CHAMELEON Construct Sparse Graph Partition the Graph Merge Partition Final Clusters Data Set
  • 18. Chameleon  Computes similarity based on Relative Interconnectivity and Relative Closeness 18