Orange Data Mining and Data Visualization Tool

Orange
Data Mining Tool
Presentation

2
Group Members:
•Name Registration Number

Why Orange?
 Open Source
 Component based
 No programming
 Data visualization
 Platform independent software
 Allows clustering and classification
 Data mining through visual programming
and python scripting
Introduction
 Orange is component based visual
programing software for data mining.
 machine learning and data analysis
 Supports communication between data
scientists and domain experts.
You can get orange software from this link:
https://orange.biolab.si/getting-started/
3

Getting Started With ORANGE!!
4

Dataset: Heart Disease
ATTRIBUTES
● Narrowing diameter
● Cholesterol
● Chest pain
● Rest ECG
● Fasting blood sugar
● Max HR
● Age,gender and more
. 7
● Has 303 instances
● 13 attributes
● Categorical class with 2
values (0,1)
● In .csv format
● Source: pre loaded
datasets of Orange.
.

● Age: heart disease increases with age greater than 65
● Fatty deposits called plaques also collect along your artery walls
● Slow the blood flow from the heart
● Causing coronary heart diseases.
● Gender: Heart disease is leading cause of death for both men and women.
8
Dataset: How following factors cause
Heart Disease?

● Aangina: is chest pain or discomfort caused when your heart muscle doesn't
get enough oxygen-rich blood.
● Cholesterol: When there is too much cholesterol in your blood.
● it builds up in the walls of your arteries
● causing a process called atherosclerosis(heart disease),
● Diameter Narrowing:
● Heart disease is caused by the narrowing or blockage of the coronary arteries.
● Target attribute (0,1)
9

Loading data file into data table:
11

EDA: Exploratory data analysis
● Distributions
.
12

Algorithms:
● KNN
● Naïve Bayes'
● Decision Tree
Selected Algorithm
● Neural Network
● Random Forest
● Logistic Regression
16

Experimental
Setup
This is how we drag and drop the widgets and
implements our algorithms
17

KNN(k nearest neighbor)
18
KNN is non-parametric method used for classification and regression.
Requires three things
 The set of stored records.
 Distance Metric to compute distance between records.
 The value of k, the number of nearest neighbors to retrieve Unknown record
Math equation: d(p,q) = √Σ(pi – 𝒒𝒊)𝟐

Decision tree
23
 Used to visually and explicitly represent decisions and decision making.
 predictive modelling approaches used in:
 statistics, data mining and machine learning
)(log)( 2
1
i
m
i
i ppDEntropy 


Naïve Baye's
31
 Also known as Naive Bayes Classifiers.
 Attributes are statistically independent on one another.
 Unlike other classifiers for a given class
 There will be some correlation between features.
 Explicitly models the features as conditionally independent given the class.
P(H|X) =
P(X|H)(P H
)𝑃(𝑋

Random Forest
36
 It is a flexible and simple
 Random Forest algorithm avoid the over fitting problem.
 Used for identifying the most important features from the training dataset.
 It can be used for both classification and regression tasks.

Logistic Regression
41
 Used to assign observations to a discrete set of classes.
 Logistic regression can be binomial, ordinal or multinomial.
 Binary (Pass/Fail)
 Multi (Cats, Dogs, Sheep)
 Ordinal (Low, Medium, High)
 Can view probability scores underlying the model’s classifications.

Neural Network
45
 Neural networks is learning algorithms.
 Interpret sensory data
 Through a kind of machine perception, labeling or clustering raw input.
 Consist of different layers for analyzing and learning data.
Math equation :
f(X)=b+∑iwixi

Table to compare data
Recall Precision F-Measures
Neural Network 0.813 0.814 0.814
Logistic Regression 0.848 0.848 0.848
Random forest 0.807 0.807 0.807
51

References:
55
https://www.youtube.com/watch?v=pYXOF0jziGM&index=6&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://www.youtube.com/watch?v=bp0VtVS3LN4&index=9&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://orange.biolab.si/getting-started/
https://en.wikipedia.org/wiki/Random_forest
https://en.wikipedia.org/wiki/Decision_tree_learning

Want big impact?
Use big image.
57

Orange Data Mining and Data Visualization Tool

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Orange Data Mining and Data Visualization Tool

Similar to Orange Data Mining and Data Visualization Tool (20)

Recently uploaded

Recently uploaded (20)

Orange Data Mining and Data Visualization Tool