SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Unsupervised Learning
Chap 10.6
Artificial Intelligence (Structure and strategies for complex problem
solving) Fifth Edition -George F Luger
What we will be studying.
Automated Mathematician (A M)
Conceptual Clustering
COBWEB & Structure of Taxonomic Knowledge
So what is Unsupervised Learning and how is it
different from Supervised Learning.?
Automated Mathematician (A M)
● One of the earliest successful discovery systems.
● Created by Douglas Lenat in Lisp.
● Began with the concept of set theory, operations for creating new knowledge by
modifying and combining existing concepts, and a set of heuristics.
● Limitations
○ AM discovered prime numbers and several other interesting concepts, it
failed to progress beyond elementary number theory.
○ In ability to “learn to learn”, as it did not acquire new heuristics from new
discoveries in mathematics.
Clustering
● Is the task of grouping a set of objects in such a way that objects in the same
group (called a cluster) are more similar to each other than those in other groups
(clusters).
● Its main task is exploratory data mining, and a common technique for statistical
data analysis.
● Used in many fields, including machine learning, pattern recognition image
analysis.
Clustering problem begins with
● Begins with a collection of unclassified object and means for measuring the
similarity of objects.
● The goal is to organize the objects into classes that meet the standard (such as
maximizing the similarity of object in same class).
● Two Strategies - Numeric and Agglomerative.
cont.
Clustering algo builds clusters in bottom-up approach.
● Examining all pairs of objects, selecting the pair with the highest degree of
similarity, and making that pair a cluster.
● Defining the features of clusters as some func. (such as avg.) of the features
of the component members and then replacing the component objects with
this cluster definition.
● Repeat the process on all collection of objects until all objects have been
reduced to single cluster.
So the result will be a Binary tree whose leaf nodes are instances and internal
nodes are clusters of increasing size.
We may extend the algorithm as set of symbolic (using similarity of objects).
obj1={small,red,rubber,ball}
obj2={small,blue,rubber,ball}
obj3={large,black,wooden,ball}
sim(obj1,obj2)=3/5
sim(obj1,obj3)=sim(obj2,obj3)=1/7
Conceptual Clustering(CC)
CC addresses problem by using machine learning techniques to produce a general
concept definition and applying background knowledge.
CLUSTER/2 is the best example of CC approach.
CLUSTER/2
● Cluster/2 forms k categories by constructing individual around k seed objects.
● Cluster/2 evaluates the resulting clusters, selecting new seeds and repeating the
process until quality criteria is met. The algo is defined as
○ Select k seeds from the set of observed objects. (selection is done randomly
or by some selection function).
○ For each seed, using that seed as +ve instance and all other seed as -ve
instance, produce maximally general definition that covers all +ve and -ve
instances.(may lead to multiple classificatn of nonseed obj’s.)
○ Classify all obj’s in the sample according to those descriptions. Replace each
maximally general description with a maximally specific description that
covers all obj’s in the category. This decreases likelihood that classes overlap
on unseen obj’s
cont.
○ Classes may still overlap on given obj’s. CLUSTER/2 includes algo for
adjusting overlapping definitions.
○ Using a distance metric, select closest to center of each class (distance
metric could be somewhat similar to similarity metric).
○ Using these central elements as new seeds repeat steps 1-5 till a desired
quality is met.
○ If cluster are unsatisfactory and no improvement occurs over several iteratn’s
select new seed closest to the edge, rather than those at the center.
COBWEB & struct. Of taxonomy knowledge
● COBWEB is an incremental system for hierarchical conceptual clustering.
● There are four basic operations COBWEB employs in building the classification
tree.
○ Merging Two Nodes-Merging two nodes means replacing them by a node
whose children is the union of the original nodes' sets of children and which
summarizes the attribute value distributions of all objects classified under
them.
○ Splitting a node:- A node is split by replacing it with its children.
○ Inserting a new node:- A node is created corresponding to the object being
inserted into the tree.
○ Passing an object down the hierarchy:- Effectively calling the COBWEB
algorithm on the object and the subtree rooted in the nodes.
cont.
● COBWEB performs hill-climbing search of possible taxonomies.
● Initializes taxonomies to single category. For each subsequent instance, the algo
begins with root category and moves thru the tree. At each level it evaluates the
taxonomies resulting from
○ Placing the instance in the best existing category.
○ Adding a new category containing only instance.
○ Merging of two existing categories into one & adding the instance to that
category.
○ Splitting of an existing category into two & placing the instance in the best
new resulting category.

Weitere ähnliche Inhalte

Was ist angesagt?

Clustering
ClusteringClustering
ClusteringMeme Hei
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love BucharestAdrian Florea
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnSarah Guido
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...Edureka!
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysisKrish_ver2
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 

Was ist angesagt? (20)

Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Kmeans
KmeansKmeans
Kmeans
 
Clustering
ClusteringClustering
Clustering
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Clustering
ClusteringClustering
Clustering
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
Clustering
ClusteringClustering
Clustering
 
Lect4
Lect4Lect4
Lect4
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysis
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 

Ähnlich wie Unsupervised Learning

Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence Karam Munir Butt
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdfSudhanshiBakre1
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1Gaston Liberman
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
hierarchical clustering.pptx
hierarchical clustering.pptxhierarchical clustering.pptx
hierarchical clustering.pptxPriyadharshiniG41
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster AnalysisSuman Mia
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsVoidVampire
 

Ähnlich wie Unsupervised Learning (20)

Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
hierarchical clustering.pptx
hierarchical clustering.pptxhierarchical clustering.pptx
hierarchical clustering.pptx
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
ML using MATLAB
ML using MATLABML using MATLAB
ML using MATLAB
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
My8clst
My8clstMy8clst
My8clst
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 

Kürzlich hochgeladen

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Kürzlich hochgeladen (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

Unsupervised Learning

  • 1. Unsupervised Learning Chap 10.6 Artificial Intelligence (Structure and strategies for complex problem solving) Fifth Edition -George F Luger
  • 2. What we will be studying. Automated Mathematician (A M) Conceptual Clustering COBWEB & Structure of Taxonomic Knowledge
  • 3. So what is Unsupervised Learning and how is it different from Supervised Learning.?
  • 4. Automated Mathematician (A M) ● One of the earliest successful discovery systems. ● Created by Douglas Lenat in Lisp. ● Began with the concept of set theory, operations for creating new knowledge by modifying and combining existing concepts, and a set of heuristics. ● Limitations ○ AM discovered prime numbers and several other interesting concepts, it failed to progress beyond elementary number theory. ○ In ability to “learn to learn”, as it did not acquire new heuristics from new discoveries in mathematics.
  • 5. Clustering ● Is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than those in other groups (clusters). ● Its main task is exploratory data mining, and a common technique for statistical data analysis. ● Used in many fields, including machine learning, pattern recognition image analysis.
  • 6. Clustering problem begins with ● Begins with a collection of unclassified object and means for measuring the similarity of objects. ● The goal is to organize the objects into classes that meet the standard (such as maximizing the similarity of object in same class). ● Two Strategies - Numeric and Agglomerative.
  • 7. cont. Clustering algo builds clusters in bottom-up approach. ● Examining all pairs of objects, selecting the pair with the highest degree of similarity, and making that pair a cluster. ● Defining the features of clusters as some func. (such as avg.) of the features of the component members and then replacing the component objects with this cluster definition. ● Repeat the process on all collection of objects until all objects have been reduced to single cluster.
  • 8. So the result will be a Binary tree whose leaf nodes are instances and internal nodes are clusters of increasing size. We may extend the algorithm as set of symbolic (using similarity of objects). obj1={small,red,rubber,ball} obj2={small,blue,rubber,ball} obj3={large,black,wooden,ball} sim(obj1,obj2)=3/5 sim(obj1,obj3)=sim(obj2,obj3)=1/7
  • 9. Conceptual Clustering(CC) CC addresses problem by using machine learning techniques to produce a general concept definition and applying background knowledge. CLUSTER/2 is the best example of CC approach.
  • 10. CLUSTER/2 ● Cluster/2 forms k categories by constructing individual around k seed objects. ● Cluster/2 evaluates the resulting clusters, selecting new seeds and repeating the process until quality criteria is met. The algo is defined as ○ Select k seeds from the set of observed objects. (selection is done randomly or by some selection function). ○ For each seed, using that seed as +ve instance and all other seed as -ve instance, produce maximally general definition that covers all +ve and -ve instances.(may lead to multiple classificatn of nonseed obj’s.) ○ Classify all obj’s in the sample according to those descriptions. Replace each maximally general description with a maximally specific description that covers all obj’s in the category. This decreases likelihood that classes overlap on unseen obj’s
  • 11. cont. ○ Classes may still overlap on given obj’s. CLUSTER/2 includes algo for adjusting overlapping definitions. ○ Using a distance metric, select closest to center of each class (distance metric could be somewhat similar to similarity metric). ○ Using these central elements as new seeds repeat steps 1-5 till a desired quality is met. ○ If cluster are unsatisfactory and no improvement occurs over several iteratn’s select new seed closest to the edge, rather than those at the center.
  • 12. COBWEB & struct. Of taxonomy knowledge ● COBWEB is an incremental system for hierarchical conceptual clustering. ● There are four basic operations COBWEB employs in building the classification tree. ○ Merging Two Nodes-Merging two nodes means replacing them by a node whose children is the union of the original nodes' sets of children and which summarizes the attribute value distributions of all objects classified under them. ○ Splitting a node:- A node is split by replacing it with its children. ○ Inserting a new node:- A node is created corresponding to the object being inserted into the tree. ○ Passing an object down the hierarchy:- Effectively calling the COBWEB algorithm on the object and the subtree rooted in the nodes.
  • 13. cont. ● COBWEB performs hill-climbing search of possible taxonomies. ● Initializes taxonomies to single category. For each subsequent instance, the algo begins with root category and moves thru the tree. At each level it evaluates the taxonomies resulting from ○ Placing the instance in the best existing category. ○ Adding a new category containing only instance. ○ Merging of two existing categories into one & adding the instance to that category. ○ Splitting of an existing category into two & placing the instance in the best new resulting category.