SlideShare ist ein Scribd-Unternehmen logo
1 von 15
JEDI
Machine Learning
&
Big Data
Clustering Algorithm
Nadeem Oozeer
SKA SA/ AIMS/ NWU
Jan 2015
Machine learning:
• Supervised vs Unsupervised.
– Supervised learning - the presence of the
outcome variable is available to guide the learning
process.
• there must be a training data set in which the solution
is already known.
– Unsupervised learning - the outcomes are
unknown.
• cluster the data to reveal meaningful partitions and
hierarchies
Clustering:
• Clustering is the task of gathering samples into groups of similar samples
according to some predefined similarity or dissimilarity measure
sample Cluster/group
• In this case clustering is carried out using the Euclidean distance as a
measure.
Clustering:
• What is clustering good for
– Market segmentation - group customers into
different market segments
– Social network analysis - Facebook "smartlists"
– Organizing computer clusters and data centers for
network layout and location
– Astronomical data analysis - Understanding
galaxy formation
Galaxy Clustering:
• Multi-wavelength data obtained for galaxy clusters
– Aim: determine robust criteria for the inclusion of a galaxy into
a cluster galaxy
– Note: physical parameters of the galaxy cluster can be heavily
influenced by wrong candidate
Credit:
HST
Clustering Algorithms :
• Hierarchy methods
– statistical method used to build a cluster by
arranging elements at various levels
Dendogram:
• Each level will then represent a possible
cluster.
• The height of the dendogram shows the level
of similarity that any two clusters are joined
• The closer to the bottom they are the more
similar the clusters are
• Finding of groups from a dendrogram is not
simple and is very often subjective
• Partitioning methods
– make an initial division of the database and then use an
iterative strategy to further divide it into sections
– here each object belongs to exactly one cluster
Credit:
Legodi,
2014
K-means:
K-means algorithm:
1. Given n objects, initialize k cluster centers
2. Assign each object to its closest cluster centre
3. Update the center for each cluster
4. Repeat 2 and 3 until no change in each cluster center
• Experiment: Pack of cards, dominoes
• Apply the K-means algorithm to the Shapley data
– Change the number of potential cluster and find how the
clustering differ
K Nearest Neighbors (k-NN):
• One of the simplest of all machine learning
classifiers
• Differs from other machine learning techniques,
in that it doesn't produce a model.
• It does however require a distance measure and
the selection of K.
• First the K nearest training data points to the new
observation are investigated.
• These K points determine the class of the new
observation.
1-NN
• Simple idea: label a new point the same as
the closest known point
Label it red.
1-NN Aspects of an
Instance-Based Learner
1. A distance metric
– Euclidian
2. How many nearby neighbors to look at?
– One
3. A weighting function (optional)
– Unused
4. How to fit with the local points?
– Just predict the same output as the nearest
neighbor.
k-NN
• Generalizes 1-NN to smooth away noise in the labels
• A new point is now assigned the most frequent label of its k
nearest neighbors
Label it red, when k = 3
Label it blue, when k = 7

Weitere ähnliche Inhalte

Was ist angesagt?

CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
butest
 

Was ist angesagt? (20)

K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
K Nearest Neighbor V1.0 Supervised Machine Learning AlgorithmK Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering
ClusteringClustering
Clustering
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Kmeans
KmeansKmeans
Kmeans
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMSCS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 

Andere mochten auch

Clustering
ClusteringClustering
Clustering
butest
 

Andere mochten auch (9)

Clustering
ClusteringClustering
Clustering
 
Machine learning hands on clustering
Machine learning hands on clusteringMachine learning hands on clustering
Machine learning hands on clustering
 
Clustering tutorial
Clustering tutorialClustering tutorial
Clustering tutorial
 
Machine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: IntroductionMachine Learning and Data Mining: 06 Clustering: Introduction
Machine Learning and Data Mining: 06 Clustering: Introduction
 
Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101Mahout and Distributed Machine Learning 101
Mahout and Distributed Machine Learning 101
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Machine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: PartitioningMachine Learning and Data Mining: 06 Clustering: Partitioning
Machine Learning and Data Mining: 06 Clustering: Partitioning
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Ähnlich wie Machine learning clustering

Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 

Ähnlich wie Machine learning clustering (20)

DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptx
 
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Machine learning clustering

  • 1. JEDI Machine Learning & Big Data Clustering Algorithm Nadeem Oozeer SKA SA/ AIMS/ NWU Jan 2015
  • 2. Machine learning: • Supervised vs Unsupervised. – Supervised learning - the presence of the outcome variable is available to guide the learning process. • there must be a training data set in which the solution is already known. – Unsupervised learning - the outcomes are unknown. • cluster the data to reveal meaningful partitions and hierarchies
  • 3. Clustering: • Clustering is the task of gathering samples into groups of similar samples according to some predefined similarity or dissimilarity measure sample Cluster/group
  • 4. • In this case clustering is carried out using the Euclidean distance as a measure.
  • 5. Clustering: • What is clustering good for – Market segmentation - group customers into different market segments – Social network analysis - Facebook "smartlists" – Organizing computer clusters and data centers for network layout and location – Astronomical data analysis - Understanding galaxy formation
  • 6. Galaxy Clustering: • Multi-wavelength data obtained for galaxy clusters – Aim: determine robust criteria for the inclusion of a galaxy into a cluster galaxy – Note: physical parameters of the galaxy cluster can be heavily influenced by wrong candidate Credit: HST
  • 7. Clustering Algorithms : • Hierarchy methods – statistical method used to build a cluster by arranging elements at various levels
  • 8. Dendogram: • Each level will then represent a possible cluster. • The height of the dendogram shows the level of similarity that any two clusters are joined • The closer to the bottom they are the more similar the clusters are • Finding of groups from a dendrogram is not simple and is very often subjective
  • 9. • Partitioning methods – make an initial division of the database and then use an iterative strategy to further divide it into sections – here each object belongs to exactly one cluster Credit: Legodi, 2014
  • 11. K-means algorithm: 1. Given n objects, initialize k cluster centers 2. Assign each object to its closest cluster centre 3. Update the center for each cluster 4. Repeat 2 and 3 until no change in each cluster center • Experiment: Pack of cards, dominoes • Apply the K-means algorithm to the Shapley data – Change the number of potential cluster and find how the clustering differ
  • 12. K Nearest Neighbors (k-NN): • One of the simplest of all machine learning classifiers • Differs from other machine learning techniques, in that it doesn't produce a model. • It does however require a distance measure and the selection of K. • First the K nearest training data points to the new observation are investigated. • These K points determine the class of the new observation.
  • 13. 1-NN • Simple idea: label a new point the same as the closest known point Label it red.
  • 14. 1-NN Aspects of an Instance-Based Learner 1. A distance metric – Euclidian 2. How many nearby neighbors to look at? – One 3. A weighting function (optional) – Unused 4. How to fit with the local points? – Just predict the same output as the nearest neighbor.
  • 15. k-NN • Generalizes 1-NN to smooth away noise in the labels • A new point is now assigned the most frequent label of its k nearest neighbors Label it red, when k = 3 Label it blue, when k = 7

Hinweis der Redaktion

  1. In order to make use of all the multi-wavelength data obtained for galaxy clusters we need to determine robust criteria for the inclusion of a galaxy into a galaxy cluster. The physical parameters can be heavily influenced by the inclusion of galaxies which do not belong and this may lead to false conclusions. Clustering algorithms can be divided into two main groups – hierarchy methods and partitioning methods.
  2. Dendogram: .... We choose a set level of similarity of about 50% of the height and then all lines which cross this level indicate a cluster. This method is combined into the partitioning methods to get starting points for the mixture modeling algorithms.
  3. It is a clustering algorithm that tries to partition a set of points into K sets (clusters) such that the points in each cluster tend to near each other. It is unsupervised because the points have no external classification.