3. The purpose of this project is to gain an understanding of the process of data
mining by
Implementing one or more data mining algorithms
Visualizing them
Comparing their performance on datasets
Another aspect was to provide visual tutorials and detailed help about these algorithms
INTRODUCTION TO DATAMINING BY SUMAIRA S.
4. WHAT IS DATA MINING?
Originally developed to act as expert systems to solve problems
Data Mining can be utilized in any organization that needs to find
patterns or relationships in their data.
Different types of Data Mining
INTRODUCTION TO DATAMINING BY SUMAIRA S.
5. BASIC FEATURES OF THE PROJECT
Handling different types of data
Pre processing of data
Algorithms implementation
Visualization of data mining model
Comparison of different data mining algorithms
Help and visual tutorials
INTRODUCTION TO DATAMINING BY SUMAIRA S.
6. HANDLING DIFFERENT DATA FORMATS
System supports following types of data files
Text Data File Handling
CSV (Comma Separated Value) File
Any User Defined Format
Database Data File Handling
MS Access Data File
MS SQL Data File
XML Data File Handling
XML Data File
INTRODUCTION TO DATAMINING BY SUMAIRA S.
7. PRE PROCESSING OF DATA
Pre processing of data includes
Filling of missing values
Ignore row
INTRODUCTION TO DATAMINING BY SUMAIRA S.
8. ALGORITHMS’ IMPLEMENTATION
Clustering
Partitional Clustering Algorithm
K-Means Algorithm
Hierarchical Clustering Algorithms
Single Linkage Algorithm
Weighted Average Algorithm
Complete Linkage Algorithm
INTRODUCTION TO DATAMINING BY SUMAIRA S.
9. VISUALIZATION OF DATA MINING MODEL
XYScatter Chart Visualization
Dendrogram
Pie Chart
Curve Graph
INTRODUCTION TO DATAMINING BY SUMAIRA S.
10. COMPARISON OF DIFFERENT DATA
MINING ALGORITHMS
Data File Comparison
Running time
Memory Usage
CPU Usage
Precision/Recall
INTRODUCTION TO DATAMINING BY SUMAIRA S.
11. K-MEAN ALGORITHM
K-mean was introduced by MC Queen in 1967
INTRODUCTION TO DATAMINING BY SUMAIRA S.
12. THE K-MEANS CLUSTERING METHOD
10
5
6
5
6
7
6
7
8
7
8
9
8
9
10
9
10
5
4
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
Assign
each of
the
objects
to most
similar
center
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
Update
the
cluster
means
4
3
2
1
0
0
Arbitrarily choose K
objects as initial
cluster center
3
4
5
6
7
8
9
10
reassign
10
10
9
9
8
7
7
6
6
5
5
4
3
2
1
0
0
INTRODUCTION TO DATAMINING BY SUMAIRA S.
2
reassign
8
K=2
1
1
2
3
4
5
6
7
8
9
10
Update
the
cluster
means
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
13. SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is
its own cluster”
2. Find “most similar”
pair of clusters
INTRODUCTION TO DATAMINING BY SUMAIRA S.
14. SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is
its own cluster”
2. Find “most similar”
pair of clusters
3. Merge it into a
parent cluster
INTRODUCTION TO DATAMINING BY SUMAIRA S.
15. SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is
its own cluster”
2. Find “most similar”
pair of clusters
3. Merge it into a
parent cluster
4. Repeat
INTRODUCTION TO DATAMINING BY SUMAIRA S.
16. SINGLE LINKAGE HIERARCHICAL CLUSTERING
1. Say “Every point is
its own cluster”
2. Find “most similar”
pair of clusters
3. Merge it into a
parent cluster
4. Repeat
INTRODUCTION TO DATAMINING BY SUMAIRA S.