SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Mauritius JEDI
Machine Learning
&
Big Data
Clustering Algorithms
Nadeem Oozeer
Machine learning:
• Supervised vs Unsupervised.
– Supervised learning - the presence of the
outcome variable is available to guide the learning
process.
• there must be a training data set in which the solution
is already known.
– Unsupervised learning - the outcomes are
unknown.
• cluster the data to reveal meaningful partitions and
hierarchies
Clustering:
• Clustering is the task of gathering samples into groups of similar samples
according to some predefined similarity or dissimilarity measure
sample Cluster/group
• In this case clustering is carried out using the Euclidean distance as a
measure.
Clustering:
• What is clustering good for
– Market segmentation - group customers into
different market segments
– Social network analysis - Facebook "smartlists"
– Organizing computer clusters and data centers for
network layout and location
– Astronomical data analysis - Understanding
galaxy formation
Galaxy Clustering:
• Multi-wavelength data obtained for galaxy clusters
– Aim: determine robust criteria for the inclusion of a galaxy into
a cluster galaxy
– Note: physical parameters of the galaxy cluster can be heavily
influenced by wrong candidate
Credit:
HST
Clustering Algorithms :
• Hierarchy methods
– statistical method used to build a cluster by
arranging elements at various levels
Dendogram:
• Each level will then represent a possible
cluster.
• The height of the dendrogram shows the level
of similarity that any two clusters are joined
• The closer to the bottom they are the more
similar the clusters are
• Finding of groups from a dendrogram is not
simple and is very often subjective
• Partitioning methods
– make an initial division of the database and then use an
iterative strategy to further divide it into sections
– here each object belongs to exactly one cluster
Credit:
Legodi,
2014
K-means:
K-means algorithm:
1. Given n objects, initialize k cluster centers
2. Assign each object to its closest cluster centre
3. Update the center for each cluster
4. Repeat 2 and 3 until no change in each cluster center
• Experiment: Pack of cards, dominoes
• Apply the K-means algorithm to the Shapley data
– Change the number of potential cluster and find how the
clustering differ
K Nearest Neighbors (k-NN):
• One of the simplest of all machine learning
classifiers
• Differs from other machine learning techniques,
in that it doesn't produce a model.
• It does however require a distance measure and
the selection of K.
• First the K nearest training data points to the new
observation are investigated.
• These K points determine the class of the new
observation.
1-NN
• Simple idea: label a new point the same as
the closest known point
Label it red.
1-NN Aspects of an
Instance-Based Learner
1. A distance metric
– Euclidian
2. How many nearby neighbors to look at?
– One
3. A weighting function (optional)
– Unused
4. How to fit with the local points?
– Just predict the same output as the nearest
neighbor.
k-NN
• Generalizes 1-NN to smooth away noise in the labels
• A new point is now assigned the most frequent label of its k
nearest neighbors
Label it red, when k = 3
Label it blue, when k = 7

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Back propagation
Back propagationBack propagation
Back propagation
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Activation function
Activation functionActivation function
Activation function
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Neural network
Neural networkNeural network
Neural network
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 

Ähnlich wie Machine learning clustering

Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clusteringNadeem Oozeer
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptxssusere1fd42
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningNandakumar P
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017Iwan Sofana
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 

Ähnlich wie Machine learning clustering (20)

Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 

Mehr von CosmoAIMS Bassett

Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopCosmoAIMS Bassett
 
Testing dark energy as a function of scale
Testing dark energy as a function of scaleTesting dark energy as a function of scale
Testing dark energy as a function of scaleCosmoAIMS Bassett
 
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013Seminar by Prof Bruce Bassett at IAP, Paris, October 2013
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013CosmoAIMS Bassett
 
Cosmology with the 21cm line
Cosmology with the 21cm lineCosmology with the 21cm line
Cosmology with the 21cm lineCosmoAIMS Bassett
 
Tuning your radio to the cosmic dawn
Tuning your radio to the cosmic dawnTuning your radio to the cosmic dawn
Tuning your radio to the cosmic dawnCosmoAIMS Bassett
 
A short introduction to massive gravity... or ... Can one give a mass to the ...
A short introduction to massive gravity... or ... Can one give a mass to the ...A short introduction to massive gravity... or ... Can one give a mass to the ...
A short introduction to massive gravity... or ... Can one give a mass to the ...CosmoAIMS Bassett
 
Decomposing Profiles of SDSS Galaxies
Decomposing Profiles of SDSS GalaxiesDecomposing Profiles of SDSS Galaxies
Decomposing Profiles of SDSS GalaxiesCosmoAIMS Bassett
 
Cluster abundances and clustering Can theory step up to precision cosmology?
Cluster abundances and clustering Can theory step up to precision cosmology?Cluster abundances and clustering Can theory step up to precision cosmology?
Cluster abundances and clustering Can theory step up to precision cosmology?CosmoAIMS Bassett
 
An Overview of Gravitational Lensing
An Overview of Gravitational LensingAn Overview of Gravitational Lensing
An Overview of Gravitational LensingCosmoAIMS Bassett
 
Testing cosmology with galaxy clusters, the CMB and galaxy clustering
Testing cosmology with galaxy clusters, the CMB and galaxy clusteringTesting cosmology with galaxy clusters, the CMB and galaxy clustering
Testing cosmology with galaxy clusters, the CMB and galaxy clusteringCosmoAIMS Bassett
 
Galaxy Formation: An Overview
Galaxy Formation: An OverviewGalaxy Formation: An Overview
Galaxy Formation: An OverviewCosmoAIMS Bassett
 
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio Data
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio DataSpit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio Data
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio DataCosmoAIMS Bassett
 
From Darkness, Light: Computing Cosmological Reionization
From Darkness, Light: Computing Cosmological ReionizationFrom Darkness, Light: Computing Cosmological Reionization
From Darkness, Light: Computing Cosmological ReionizationCosmoAIMS Bassett
 
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?CosmoAIMS Bassett
 
Binary pulsars as tools to study gravity
Binary pulsars as tools to study gravityBinary pulsars as tools to study gravity
Binary pulsars as tools to study gravityCosmoAIMS Bassett
 
Cross Matching EUCLID and SKA using the Likelihood Ratio
Cross Matching EUCLID and SKA using the Likelihood RatioCross Matching EUCLID and SKA using the Likelihood Ratio
Cross Matching EUCLID and SKA using the Likelihood RatioCosmoAIMS Bassett
 
Machine Learning Challenges in Astronomy
Machine Learning Challenges in AstronomyMachine Learning Challenges in Astronomy
Machine Learning Challenges in AstronomyCosmoAIMS Bassett
 
Cosmological Results from Planck
Cosmological Results from PlanckCosmological Results from Planck
Cosmological Results from PlanckCosmoAIMS Bassett
 

Mehr von CosmoAIMS Bassett (20)

Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
 
Testing dark energy as a function of scale
Testing dark energy as a function of scaleTesting dark energy as a function of scale
Testing dark energy as a function of scale
 
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013Seminar by Prof Bruce Bassett at IAP, Paris, October 2013
Seminar by Prof Bruce Bassett at IAP, Paris, October 2013
 
Cosmology with the 21cm line
Cosmology with the 21cm lineCosmology with the 21cm line
Cosmology with the 21cm line
 
Tuning your radio to the cosmic dawn
Tuning your radio to the cosmic dawnTuning your radio to the cosmic dawn
Tuning your radio to the cosmic dawn
 
A short introduction to massive gravity... or ... Can one give a mass to the ...
A short introduction to massive gravity... or ... Can one give a mass to the ...A short introduction to massive gravity... or ... Can one give a mass to the ...
A short introduction to massive gravity... or ... Can one give a mass to the ...
 
Decomposing Profiles of SDSS Galaxies
Decomposing Profiles of SDSS GalaxiesDecomposing Profiles of SDSS Galaxies
Decomposing Profiles of SDSS Galaxies
 
Cluster abundances and clustering Can theory step up to precision cosmology?
Cluster abundances and clustering Can theory step up to precision cosmology?Cluster abundances and clustering Can theory step up to precision cosmology?
Cluster abundances and clustering Can theory step up to precision cosmology?
 
An Overview of Gravitational Lensing
An Overview of Gravitational LensingAn Overview of Gravitational Lensing
An Overview of Gravitational Lensing
 
Testing cosmology with galaxy clusters, the CMB and galaxy clustering
Testing cosmology with galaxy clusters, the CMB and galaxy clusteringTesting cosmology with galaxy clusters, the CMB and galaxy clustering
Testing cosmology with galaxy clusters, the CMB and galaxy clustering
 
Galaxy Formation: An Overview
Galaxy Formation: An OverviewGalaxy Formation: An Overview
Galaxy Formation: An Overview
 
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio Data
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio DataSpit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio Data
Spit, Duct Tape, Baling Wire & Oral Tradition: Dealing With Radio Data
 
MeerKAT: an overview
MeerKAT: an overviewMeerKAT: an overview
MeerKAT: an overview
 
Casa cookbook for KAT 7
Casa cookbook for KAT 7Casa cookbook for KAT 7
Casa cookbook for KAT 7
 
From Darkness, Light: Computing Cosmological Reionization
From Darkness, Light: Computing Cosmological ReionizationFrom Darkness, Light: Computing Cosmological Reionization
From Darkness, Light: Computing Cosmological Reionization
 
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?
WHAT CAN WE DEDUCE FROM STUDIES OF NEARBY GALAXY POPULATIONS?
 
Binary pulsars as tools to study gravity
Binary pulsars as tools to study gravityBinary pulsars as tools to study gravity
Binary pulsars as tools to study gravity
 
Cross Matching EUCLID and SKA using the Likelihood Ratio
Cross Matching EUCLID and SKA using the Likelihood RatioCross Matching EUCLID and SKA using the Likelihood Ratio
Cross Matching EUCLID and SKA using the Likelihood Ratio
 
Machine Learning Challenges in Astronomy
Machine Learning Challenges in AstronomyMachine Learning Challenges in Astronomy
Machine Learning Challenges in Astronomy
 
Cosmological Results from Planck
Cosmological Results from PlanckCosmological Results from Planck
Cosmological Results from Planck
 

Machine learning clustering

  • 1. Mauritius JEDI Machine Learning & Big Data Clustering Algorithms Nadeem Oozeer
  • 2. Machine learning: • Supervised vs Unsupervised. – Supervised learning - the presence of the outcome variable is available to guide the learning process. • there must be a training data set in which the solution is already known. – Unsupervised learning - the outcomes are unknown. • cluster the data to reveal meaningful partitions and hierarchies
  • 3. Clustering: • Clustering is the task of gathering samples into groups of similar samples according to some predefined similarity or dissimilarity measure sample Cluster/group
  • 4. • In this case clustering is carried out using the Euclidean distance as a measure.
  • 5. Clustering: • What is clustering good for – Market segmentation - group customers into different market segments – Social network analysis - Facebook "smartlists" – Organizing computer clusters and data centers for network layout and location – Astronomical data analysis - Understanding galaxy formation
  • 6. Galaxy Clustering: • Multi-wavelength data obtained for galaxy clusters – Aim: determine robust criteria for the inclusion of a galaxy into a cluster galaxy – Note: physical parameters of the galaxy cluster can be heavily influenced by wrong candidate Credit: HST
  • 7. Clustering Algorithms : • Hierarchy methods – statistical method used to build a cluster by arranging elements at various levels
  • 8. Dendogram: • Each level will then represent a possible cluster. • The height of the dendrogram shows the level of similarity that any two clusters are joined • The closer to the bottom they are the more similar the clusters are • Finding of groups from a dendrogram is not simple and is very often subjective
  • 9. • Partitioning methods – make an initial division of the database and then use an iterative strategy to further divide it into sections – here each object belongs to exactly one cluster Credit: Legodi, 2014
  • 11. K-means algorithm: 1. Given n objects, initialize k cluster centers 2. Assign each object to its closest cluster centre 3. Update the center for each cluster 4. Repeat 2 and 3 until no change in each cluster center • Experiment: Pack of cards, dominoes • Apply the K-means algorithm to the Shapley data – Change the number of potential cluster and find how the clustering differ
  • 12. K Nearest Neighbors (k-NN): • One of the simplest of all machine learning classifiers • Differs from other machine learning techniques, in that it doesn't produce a model. • It does however require a distance measure and the selection of K. • First the K nearest training data points to the new observation are investigated. • These K points determine the class of the new observation.
  • 13. 1-NN • Simple idea: label a new point the same as the closest known point Label it red.
  • 14. 1-NN Aspects of an Instance-Based Learner 1. A distance metric – Euclidian 2. How many nearby neighbors to look at? – One 3. A weighting function (optional) – Unused 4. How to fit with the local points? – Just predict the same output as the nearest neighbor.
  • 15. k-NN • Generalizes 1-NN to smooth away noise in the labels • A new point is now assigned the most frequent label of its k nearest neighbors Label it red, when k = 3 Label it blue, when k = 7

Hinweis der Redaktion

  1. In order to make use of all the multi-wavelength data obtained for galaxy clusters we need to determine robust criteria for the inclusion of a galaxy into a galaxy cluster. The physical parameters can be heavily influenced by the inclusion of galaxies which do not belong and this may lead to false conclusions. Clustering algorithms can be divided into two main groups – hierarchy methods and partitioning methods.
  2. Dendogram: .... We choose a set level of similarity of about 50% of the height and then all lines which cross this level indicate a cluster. This method is combined into the partitioning methods to get starting points for the mixture modeling algorithms.
  3. It is a clustering algorithm that tries to partition a set of points into K sets (clusters) such that the points in each cluster tend to near each other. It is unsupervised because the points have no external classification.