Cluster analysis using Rapidminer and Sas

•Als PPTX, PDF herunterladen•

3 gefällt mir•8,398 views

Madhumita Ghosh

Technologie Gesundheit & Medizin

Cluster Analysis
Using RapidMiner and SAS 9.3

Agenda
• The Data
• Some preliminary treatments






•
•
•
•
•
•

Checking for outliers
Manual outlier checking for a given confidence level
Filtering outliers
Data without outliers
Selecting attributes for clusters

Setting up clusters
Reading the clusters
Using SAS for clustering
Dendrogram
Depicting Tree using SAS
Conclusion

The Data
• Number of observations: 97
• 3 numeric variables:
 Birth rate per thousand
 Death rate per thousand
 Infant mortality rate per thousand

• 1 polynomial variable: Country
• Data obtained from UN Demographic
Yearbook 1990

Some preliminary treatments
• Checking for outliers using RapidMiner

Some preliminary treatments
• Manual checking for outliers at a given confidence
level
• For Birth (95%)
 mu-2(sigma) = 27.384-2(12.978) = 1.428
 mu+2(sigma) = 27.384+2(12.978) = 53.34

• Hence, no outliers

• Filtering outliers
o 10 outliers recorded

• Data without outliers
o Filter examples
o Parameter string: outlier=true
o Invert filter

• Selecting attributes for clusters
o Clusters on polynomial variables make no sense
o Remove Country from attribute list

• Setting up clusters
o K=3
o Join both nodes to get cluster model information

Reading the Clusters
•
•
•

Cluster 1: Low values of each numeric variable
Cluster 2: High values of each numeric variable
Cluster 0: Moderate values of each numeric variable

Reading the Clusters
•
•

Scatter Plot Birth and Death against Infant Death
Rate
Size – Infant Death Rate

Using SAS for clustering
•
•

Using canonical variables for standardization of
variables to mean 0 and standard deviation 1
Spherical within-cluster covariance matrix

proc aceclus data=Poverty out=Ace p=.03
noprint;
var Birth Death InfantDeath;
run;
proc cluster data=Ace outtree=Tree
method=ward
ccc pseudo print=15;
var can1 can2 can3 ;
id Country;
run;

Using SAS for clustering
•

First 2 canonical variables account for about 93% of
the total variation

Tree depiction
•
•

Plot can1 and can2 against cluster
Shows similar plot compared to RapidMiner output

Conclusion
•

Cluster 1: Mostly developed European nations, USA, UK,
Singapore, USSR, etc
•
•
•

•

Cluster 2: Afghanistan, Pakistan, Iran, mostly under
privileged African nations
•
•
•
•

•

Efficient allocation of public goods
Lower crime rates
Abortion legalized

Low GDP
Abortion not legal
High crime rates, prevalent wars and terrorist activities
Poor health standards, high poverty levels

Cluster 0: India, Mexico, South Africa, Saudi Arabia, etc
•
•
•
•

Emerging nations
Increasing growth rates
Controlled negative externalities
Focus on literacy and employment

Weitere ähnliche Inhalte

Andere mochten auch

Customer Management - A Practioners PerspectiveSAS Institute India Pvt. Ltd

Data manipulation with RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Introduction to Feature (Attribute) Selection with RapidMiner Studio 6Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido

RapidminerGernot Schulmeister

RapidMiner: Introduction To Rapid MinerRapidmining Content

K-means Clustering with Scikit-LearnSarah Guido

Introduction to Text Classification with RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Search Twitter with RapidMiner Studio 6Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Advanced Predictive Modeling with R and RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Data mining tools (R , WEKA, RAPID MINER, ORANGE)Krishna Petrochemicals

Predictive analytic-for-retail-businessBig Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Application of Clustering in Data Science using Real-life Examples Edureka!

Building Decision Tree model with numerical attributesBig Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Evaluation metrics: Precision, Recall, F-Measure, ROCBig Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Introduction to Data Mining and Big Data AnalyticsBig Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Cluster analysis for market segmentationVishal Tandel

Andere mochten auch (17)

Customer Management - A Practioners Perspective

Data manipulation with RapidMiner Studio 7

Introduction to Feature (Attribute) Selection with RapidMiner Studio 6

A Beginner's Guide to Machine Learning with Scikit-Learn

Rapidminer

RapidMiner: Introduction To Rapid Miner

K-means Clustering with Scikit-Learn

Introduction to Text Classification with RapidMiner Studio 7

Search Twitter with RapidMiner Studio 6

Advanced Predictive Modeling with R and RapidMiner Studio 7

Data mining tools (R , WEKA, RAPID MINER, ORANGE)

Predictive analytic-for-retail-business

Application of Clustering in Data Science using Real-life Examples

Building Decision Tree model with numerical attributes

Evaluation metrics: Precision, Recall, F-Measure, ROC

Introduction to Data Mining and Big Data Analytics

Cluster analysis for market segmentation

Ähnlich wie Cluster analysis using Rapidminer and Sas

Vanderbilt bClaudine Garcia

Feature selection with imbalanced data in agricultureAboul Ella Hassanien

Multivariate analysisDrMuhammadMobeenShaf

Multivariate Analysis.pptJayaChandran570837

Multivariate AnalysisStig-Arne Kristoffersen

Spss tutorial-cluster-analysisAnimesh Kumar

Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P

Prostate Cancer Diagnosis using Deep Learning with 3D Multiparametric MRI: Pr...Saifeng (Aaron) Liu

Genetic algorithmJari Abbas

House Sale Price Predictionsriram30691

Statistical modeling in San Francisco Crime PredictionJiaying Li

Association mapping, GWAS, Mapping, natural population mappingMahesh Biradar

a brief introduction to epistasis detectionHyun-hwan Jeong

Machine Learning WorkshopEnplus Advisors, Inc.

Microarray AnalysisJames McInerney

Detecting STR Peaks in Degraded DNA samplesEmanuela Marasco

Project presentation - CapstoneSkandha Ch

SPSS Step-by-Step Tutorial and Statistical Guides by StatsworkStats Statswork

Data Science Project by Areeb Ansari.pptAreebAnsari16

Data_Preparation.pptxImXaib

Ähnlich wie Cluster analysis using Rapidminer and Sas (20)

Vanderbilt b

Feature selection with imbalanced data in agriculture

Multivariate analysis

Multivariate Analysis.ppt

Multivariate Analysis

Spss tutorial-cluster-analysis

Anomaly detection (Unsupervised Learning) in Machine Learning

Prostate Cancer Diagnosis using Deep Learning with 3D Multiparametric MRI: Pr...

Genetic algorithm

House Sale Price Prediction

Statistical modeling in San Francisco Crime Prediction

Association mapping, GWAS, Mapping, natural population mapping

a brief introduction to epistasis detection

Machine Learning Workshop

Microarray Analysis

Detecting STR Peaks in Degraded DNA samples

Project presentation - Capstone

SPSS Step-by-Step Tutorial and Statistical Guides by Statswork

Data Science Project by Areeb Ansari.ppt

Data_Preparation.pptx

Kürzlich hochgeladen

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

How to write a Business Continuity PlanDatabarracks

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Manual 508 Accessibility Compliance AuditSkynet Technologies

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

From Family Reminiscence to Scholarly Archive .Alan Dix

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Kürzlich hochgeladen (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

How to write a Business Continuity Plan

Generative AI for Technical Writer or Information Developers

The State of Passkeys with FIDO Alliance.pptx

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Manual 508 Accessibility Compliance Audit

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Long journey of Ruby standard library at RubyConf AU 2024

What is DBT - The Ultimate Data Build Tool.pdf

From Family Reminiscence to Scholarly Archive .

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

UiPath Community: Communication Mining from Zero to Hero

Decarbonising Buildings: Making a net-zero built environment a reality

Cluster analysis using Rapidminer and Sas

1. Cluster Analysis Using RapidMiner and SAS 9.3

2. Agenda • The Data • Some preliminary treatments      • • • • • • Checking for outliers Manual outlier checking for a given confidence level Filtering outliers Data without outliers Selecting attributes for clusters Setting up clusters Reading the clusters Using SAS for clustering Dendrogram Depicting Tree using SAS Conclusion

3. The Data • Number of observations: 97 • 3 numeric variables:  Birth rate per thousand  Death rate per thousand  Infant mortality rate per thousand • 1 polynomial variable: Country • Data obtained from UN Demographic Yearbook 1990

4. Some preliminary treatments • Checking for outliers using RapidMiner

5. Some preliminary treatments • Manual checking for outliers at a given confidence level • For Birth (95%)  mu-2(sigma) = 27.384-2(12.978) = 1.428  mu+2(sigma) = 27.384+2(12.978) = 53.34 • Hence, no outliers

6. • Filtering outliers o 10 outliers recorded

7. • Data without outliers o Filter examples o Parameter string: outlier=true o Invert filter

8. • Selecting attributes for clusters o Clusters on polynomial variables make no sense o Remove Country from attribute list

9. • Setting up clusters o K=3 o Join both nodes to get cluster model information

10. Reading the Clusters • • • Cluster 1: Low values of each numeric variable Cluster 2: High values of each numeric variable Cluster 0: Moderate values of each numeric variable

11. Reading the Clusters • • Scatter Plot Birth and Death against Infant Death Rate Size – Infant Death Rate

12. Using SAS for clustering • • Using canonical variables for standardization of variables to mean 0 and standard deviation 1 Spherical within-cluster covariance matrix proc aceclus data=Poverty out=Ace p=.03 noprint; var Birth Death InfantDeath; run; proc cluster data=Ace outtree=Tree method=ward ccc pseudo print=15; var can1 can2 can3 ; id Country; run;

13. Using SAS for clustering • First 2 canonical variables account for about 93% of the total variation

14. Dendrogram

15. Tree depiction • • Plot can1 and can2 against cluster Shows similar plot compared to RapidMiner output

16. Conclusion • Cluster 1: Mostly developed European nations, USA, UK, Singapore, USSR, etc • • • • Cluster 2: Afghanistan, Pakistan, Iran, mostly under privileged African nations • • • • • Efficient allocation of public goods Lower crime rates Abortion legalized Low GDP Abortion not legal High crime rates, prevalent wars and terrorist activities Poor health standards, high poverty levels Cluster 0: India, Mexico, South Africa, Saudi Arabia, etc • • • • Emerging nations Increasing growth rates Controlled negative externalities Focus on literacy and employment

Cluster analysis using Rapidminer and Sas

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Cluster analysis using Rapidminer and Sas

Ähnlich wie Cluster analysis using Rapidminer and Sas (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cluster analysis using Rapidminer and Sas