Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
On Predicting and Analyzing Breast Cancer using Data Mining Approach
1. On Predicting and Analyzing Breast Cancer
using Data Mining Approach
Under the Supervision of
Suman Saha
Assistant Professor
Department of CSE
Bangladesh University of Businessand
Technology (BUBT)
Presented by
Md. Masud Rana Basunia
Ismot Ara Pervin
Md. Al Mahmud
2. Outline
1. Introduction
2. Review of Literature
3. Motivation
4. Proposed Diagram
5. Proposed Methodology
6. Result Analysis
7. Conclusion
3. Introduction
A cancer that develops in breast tissue.
One of the leading cancers for women compared to all other cancer.
Early detection is the most effective way to reduce breast cancer
deaths.
Data mining is the process of running powerful classification
techniques that extract useful information from data.
The techniques enable to create a model that can learn from past
data and detect patterns.
With a robustly validated classification model, chances of the right
prediction improve.
It specially helps in interpretation of results for borderline cases.
4. Review of Literature
Multi-boost SMO classification technique used for classifying breast
cancer.
Applied KNN, logistic regression, and multivariate linear regression and
classified tumor type on Wisconsin dataset.
Used the SPSS Clementine data mining tool and analyzed with various
kernel functions and parameters of the SVM.
Experimented on breast cancer data using C5 algorithm with bagging
to predict breast cancer survivability.
Applied the best tree, IBK and SMO to classify tumor type.
5. Motivation
The correct designation in determining whether or not the tumor is
benign or malignant is important for saving lives.
For a doctor, it needs quite a bit of time to classify breast cancer but
using the data mining approach can classify instantly.
Normally it’s difficult to distinguish certain benign masses from
malignant lesions with mammography, where using the data mining
approach can detect effectively.
It's also Cost-effective.
The model could predict the cases within higher accuracy.
7. Proposed Methodology
Collecting the dataset.
Preprocessing the dataset.
Selecting best features.
Applying classification technique.
Evaluating model performance.
8. Proposed Methodology contd.
Dataset Information
Dataset was obtained from the UCI
Machine Learning Repository.
Dataset having 569 instances with 32
features.
Features are computed from a digitized
image of a fine needle aspirate(FNA) of
a breast mass.
Two classes as Malignant (Cancerous)
and Benign (Non-Cancerous).
Class distribution of Malignant: 212
(37.3%) and Benign: 357 (62.7%)
instances.
9. Proposed Methodology contd.
Dataset Preprocessing
Converted categorical values as numeric values using Label
Encoding.
Removed outliers using Interquartile Range (IQR) method.
Normalized the dataset using Standard Scaling method.
10. Proposed Methodology contd.
Feature Selection
Selected 20 top features using the
Univariate Feature Selection method.
The method calculated ch2 score for
each feature using the ch2 formula.
It’s improving the prediction
performance of the predictors.
Provided a faster and more cost-
effective predictors.
11. Proposed Methodology contd.
Proposed Classification Technique
Firstly applied five classification techniques and chose four
classifiers in the base of accuracy.
Then applied Stacking Classifier which is an ensemble method and
has two levels.
In level 0, applied three classification techniques on the dataset
with 10-fold cross-validation and compute individual output.
In level 1, applied a meta classifier that combines the previous
output and provides a final output.
12. Proposed Methodology contd.
Performance Model Evaluation
The evaluation of data mining classification technique’s performance
involves testing the proposed model.
Confusion matrix is used for evaluating the performance.
Different performance evaluation parameters as accuracy, ROC area,
precision, recall, F1 score are applied to evaluate the performance of
the classification technique.
16. Conclusion
The automatic prediction of breast cancer is significant to abate the
propensity against enlarging this disease.
Data mining classification techniques play a vital role to predict breast
cancer.
We have presented a comparative study of different classification
techniques for the detection of breast cancer.
It has been observed that Stacking Classifier had an accuracy of
97.20% to determine benign or malignant tumor.
The performance of the Stacking Classifier shows a high level compare
with other classifiers.