On Predicting and Analyzing Breast Cancer using Data Mining Approach

On Predicting and Analyzing Breast Cancer
using Data Mining Approach
Under the Supervision of
Suman Saha
Assistant Professor
Department of CSE
Bangladesh University of Businessand
Technology (BUBT)
Presented by
Md. Masud Rana Basunia
Ismot Ara Pervin
Md. Al Mahmud

Outline
1. Introduction
2. Review of Literature
3. Motivation
4. Proposed Diagram
5. Proposed Methodology
6. Result Analysis
7. Conclusion

Introduction
 A cancer that develops in breast tissue.
 One of the leading cancers for women compared to all other cancer.
 Early detection is the most effective way to reduce breast cancer
deaths.
 Data mining is the process of running powerful classification
techniques that extract useful information from data.
 The techniques enable to create a model that can learn from past
data and detect patterns.
 With a robustly validated classification model, chances of the right
prediction improve.
 It specially helps in interpretation of results for borderline cases.

Review of Literature
 Multi-boost SMO classification technique used for classifying breast
cancer.
 Applied KNN, logistic regression, and multivariate linear regression and
classified tumor type on Wisconsin dataset.
 Used the SPSS Clementine data mining tool and analyzed with various
kernel functions and parameters of the SVM.
 Experimented on breast cancer data using C5 algorithm with bagging
to predict breast cancer survivability.
 Applied the best tree, IBK and SMO to classify tumor type.

Motivation
 The correct designation in determining whether or not the tumor is
benign or malignant is important for saving lives.
 For a doctor, it needs quite a bit of time to classify breast cancer but
using the data mining approach can classify instantly.
 Normally it’s difficult to distinguish certain benign masses from
malignant lesions with mammography, where using the data mining
approach can detect effectively.
 It's also Cost-effective.
 The model could predict the cases within higher accuracy.

Proposed Methodology
 Collecting the dataset.
 Preprocessing the dataset.
 Selecting best features.
 Applying classification technique.
 Evaluating model performance.

Proposed Methodology contd.
Dataset Information
Dataset was obtained from the UCI
Machine Learning Repository.
Dataset having 569 instances with 32
features.
Features are computed from a digitized
image of a fine needle aspirate(FNA) of
a breast mass.
Two classes as Malignant (Cancerous)
and Benign (Non-Cancerous).
Class distribution of Malignant: 212
(37.3%) and Benign: 357 (62.7%)
instances.

Dataset Preprocessing
Converted categorical values as numeric values using Label
Encoding.
Removed outliers using Interquartile Range (IQR) method.
Normalized the dataset using Standard Scaling method.

Feature Selection
Selected 20 top features using the
Univariate Feature Selection method.
The method calculated ch2 score for
each feature using the ch2 formula.
It’s improving the prediction
performance of the predictors.
Provided a faster and more cost-
effective predictors.

Proposed Classification Technique
 Firstly applied five classification techniques and chose four
classifiers in the base of accuracy.
 Then applied Stacking Classifier which is an ensemble method and
has two levels.
 In level 0, applied three classification techniques on the dataset
with 10-fold cross-validation and compute individual output.
 In level 1, applied a meta classifier that combines the previous
output and provides a final output.

Performance Model Evaluation
The evaluation of data mining classification technique’s performance
involves testing the proposed model.
Confusion matrix is used for evaluating the performance.
Different performance evaluation parameters as accuracy, ROC area,
precision, recall, F1 score are applied to evaluate the performance of
the classification technique.

Result Analysis
Decision Boundary for Classification Technique Comparison of Accuracy

Result Analysis contd.
Confusion Matrix of Stacking Classifier Error Rate for Training and Testing set

Result Analysis contd.
Classification Report for Classification Techniques

Conclusion
 The automatic prediction of breast cancer is significant to abate the
propensity against enlarging this disease.
 Data mining classification techniques play a vital role to predict breast
cancer.
 We have presented a comparative study of different classification
techniques for the detection of breast cancer.
 It has been observed that Stacking Classifier had an accuracy of
97.20% to determine benign or malignant tumor.
 The performance of the Stacking Classifier shows a high level compare
with other classifiers.

On Predicting and Analyzing Breast Cancer using Data Mining Approach

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to On Predicting and Analyzing Breast Cancer using Data Mining Approach

Similar to On Predicting and Analyzing Breast Cancer using Data Mining Approach (20)

Recently uploaded

Recently uploaded (20)

On Predicting and Analyzing Breast Cancer using Data Mining Approach