a novel approach for breast cancer detection using data mining tool weka
1. A Novel Approach for Breast Cancer
Detection using
Data Mining Techniques
Presented by:
• Ahmed Abd Elhafeez
• Ahmed Elbohy
Under supervision of :
Prof. Dr. Aliaa Youssif
13/27/2014 AAST-Comp eng
2. AGENDA
Scientific and Medical Background
1. What is cancer?
2. Breast cancer
3. History and Background
4. Pattern recognition system decomposition
5. About data mining
6. Data mining tools
7. Classification Techniques
2 3/27/2014AAST-Comp eng
3. AGENDA (Cont.)
Paper contents
1. Introduction
2. Related Work
3. Classification Techniques
4. Experiments and Results
5. Conclusion
6. References
3 3/27/2014AAST-Comp eng
4. What Is Cancer?
Cancer is a term used for diseases in which abnormal
cells divide without control and are able to invade
other tissues. Cancer cells can spread to other parts of
the body through the blood and lymph systems.
Cancer is not just one disease but many diseases.
There are more than 100 different types of cancer.
Most cancers are named for the organ or type of cell in
which they start
There are two general types of cancer tumours namely:
• benign
• malignant
4 AAST-Comp eng 3/27/2014
5. Skin
cancer
Breast cancerColon cancer
Lung cancer
Pancreatic cancer
Liver Bladder
Prostate Cancer
Kidney cancerThyroid Cancer
Leukemia Cancer
Edometrial Cancer
Rectal Cancer
Non-Hodgkin Lymphoma
Cervical cancer
Thyroid Cancer
Oral cancer
AAST-Comp eng 53/27/2014
6. Breast Cancer
6
• The second leading cause of death among
women is breast cancer, as it comes
directly after lung cancer.
• Breast cancer considered the most
common invasive cancer in women, with
more than one million cases and nearly
600,000 deaths occurring worldwide
annually.
• Breast cancer comes in the top of cancer
list in Egypt by 42 cases per 100 thousand
of the population. However 80% of the
cases of breast cancer in Egypt are of the
benign kind.
AAST-Comp eng3/27/2014
7. History and Background
Medical Prognosis is the estimation of :
• Cure
• Complication
• disease recurrence
• Survival
for a patient or group of patients after treatment.
7AAST-Comp eng3/27/2014
8. • There are a lot of works done for various diseases like cancer
• like shown in paper [1].As technique used in it is very
• convenient since the Decision Tree is simple to understand,
• works with mixed data types, models non-linear functions,
• handles classification, and most of the readily available tools
• use it. Even in the paper [2] that I referred discusses how data
• warehousing, data mining, and decision support systems can
• reduce the national cancer burden or the oral complications of
• cancer therapies. For this goal to be achieved, it first will be
• necessary to monitor populations; collect relevant cancer
• screening, incidence, treatment, and outcomes data; identify
• cancer patterns; explain the patterns, and translate the
• explanations into effective diagnoses and treatments. The next
• paper that I referred [3] contains the evaluation of the breast
• masses in a series of pathologically proven tumours using data
• mining with decision tree model for classification of breast
• tumours. Accuracy, sensitivity, specificity, positive predictive
• value and negative predictive value are the five most generally
• used objective indices to estimate the performance of diagnosis
• results. Sensitivity and specificity are the most two important
• indices that a doctor concerned about. With sensitivity 93.33%
3/27/2014 AAST-Comp eng 8
9. • 320 for the detection of bacteria causing eye
infections using
• pure laboratory cultures and the screening
of bacteria
• associated with ENT infections using actual
hospital samples.
• Bong-Horng chu and his team [5] propose a
hybridized
• architecture to deal with customer retention
3/27/2014 AAST-Comp eng 9
10. Breast Cancer Classification
10AAST-Comp eng
Round well-
defined, larger
groups are more
likely benign.
Tight cluster of tiny,
irregularly shaped
groups may indicate
cancer Malignant
Suspicious pixels groups show up as white spots on a
mammogram.
3/27/2014
11. Breast cancer’s Features
• MRI - Cancer can have a unique appearance –
features that turned out to be cancer used for diagnosis
/ prognosis of each cell nucleus.
11AAST-Comp eng
F2Magnetic
Resonance Image
F1
F3
Fn
Feature
Extraction
3/27/2014
13. Computer-Aided Diagnosis
• Mammography allows for efficient diagnosis of
breast cancers at an earlier stage
• Radiologists misdiagnose 10-30% of the malignant
cases
• Of the cases sent for surgical biopsy, only 10-20%
are actually malignant
3/27/2014 AAST-Comp eng 13
15. What do these methods do?
• Provide non-parametric models of data.
• Allow to classify new data to pre-defined
categories, supporting diagnosis &
prognosis.
• Allow to discover new categories.
• Allow to understand the data, creating fuzzy
or crisp logical rules.
• Help to visualize multi-dimensional
relationships among data samples.3/27/2014 AAST-Comp eng 15
16. Feature selection
Data Preprocessing
Selecting Data mining tooldataset
Classification algorithm
SMO IBK BF TREE
Results and evaluations
AAST-Comp eng
Pattern recognition system decomposition
3/27/2014
20. AAST-Comp eng 20
Data Mining
• Data Mining is set of techniques used
in various domains to give meaning to
the available data
• Objective: Fit data to a model
–Descriptive
–Predictive
3/27/2014
21. Predictive & descriptive data mining
• Predictive:
Is the process of automatically creating a classification
model from a set of examples, called the training set,
which belongs to a set of classes.
Once a model is created, it can be used to automatically
predict the class of other unclassified examples
• Descriptive :
Is to describe the general or special features of a set of
data in a concise manner
AAST-Comp eng 213/27/2014
23. Data mining Tools
Many advanced tools for data mining are
available either as open-source or commercial
software.
23AAST-Comp eng3/27/2014
24. weka
• Waikato environment for knowledge analysis
• Weka is a collection of machine learning algorithms for
data mining tasks. The algorithms can either be applied
directly to a dataset or called from your own Java code.
• Weka contains tools for data pre-processing,
classification, regression, clustering, association rules,
and visualization. It is also well-suited for developing
new machine learning schemes.
• Found only on the islands of New Zealand, the Weka is
a flightless bird with an inquisitive nature.
3/27/2014 AAST-Comp eng 24
26. Data Preprocessing
• Data in the real world is :
– incomplete: lacking attribute values, lacking certain attributes
of interest, or containing only aggregate data
– noisy: containing errors or outliers
– inconsistent: containing discrepancies in codes or names
• Quality decisions must be based on quality data
measures:
Accuracy ,Completeness, Consistency, Timeliness, Believability,
Value added and Accessibility
AAST-Comp eng 263/27/2014
27. Preprocessing techniques
• Data cleaning
– Fill in missing values, smooth noisy data, identify or remove outliers and
resolve inconsistencies
• Data integration
– Integration of multiple databases, data cubes or files
• Data transformation
– Normalization and aggregation
• Data reduction
– Obtains reduced representation in volume but produces the same or
similar analytical results
• Data discretization
– Part of data reduction but with particular importance, especially for
numerical data
AAST-Comp eng 273/27/2014
29. Finding a feature subset that has the most
discriminative information from the original
feature space.
The objective of feature selection is :
• Improving the prediction performance of the
predictors
• Providing a faster and more cost-effective
predictors
• Providing a better understanding of the underlying
process that generated the data
Feature selection
AAST-Comp eng 293/27/2014
32. Supervised Learning
• Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
• New data is classified based on the model built on training set
known categories
AAST-Comp eng
Category ―A‖
Category ―B‖
Classification (Recognition)
(Supervised Classification)
323/27/2014
33. Classification
• Everyday, all the time we classify
things.
• Eg crossing the street:
– Is there a car coming?
– At what speed?
– How far is it to the other side?
– Classification: Safe to walk or not!!!
3/27/2014 AAST-Comp eng 33
34. 3/27/2014 AAST-Comp eng 34
Classification:
predicts categorical class labels (discrete or nominal)
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
Prediction:
models continuous-valued functions, i.e., predicts
unknown or missing values
Classification vs. Prediction
35. 3/27/2014 AAST-Comp eng 35
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes
Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
The set of tuples used for model construction is training set
The model is represented as classification rules, decision trees,
or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the
classified result from the model
Accuracy rate is the percentage of test set samples that are
correctly classified by the model
Test set is independent of training set, otherwise over-fitting
will occur
If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
36. 3/27/2014 AAST-Comp eng 36
Classification Process (1): Model
Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = „professor‟
OR years > 6
THEN tenured = „yes‟
Classifier
(Model)
37. 3/27/2014 AAST-Comp eng 37
Classification Process (2): Use the
Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
38. Classification
• is a data mining (machine learning) technique used to
predict group membership for data instances.
• Classification analysis is the organization of data in
given class.
• These approaches normally use a training set where
all objects are already associated with known class
labels.
• The classification algorithm learns from the training
set and builds a model.
• Many classification models are used to classify new
objects.
AAST-Comp eng 383/27/2014
39. Classification
• predicts categorical class labels (discrete or
nominal)
• constructs a model based on the training set
and the values (class labels) in a classifying
attribute and uses it in classifying unseen
data
AAST-Comp eng 393/27/2014
40. Quality of a classifier
• Quality will be calculated with respect to lowest
computing time.
• Quality of certain model one can describe by confusion
matrix.
• Confusion matrix shows a new entry properties
predictive ability of the method.
• Row of the matrix represents the instances in a
predicted class, while each column represents the
instances in an actual class.
• Thus the diagonal elements represent correctly
classified compounds
• the cross-diagonal elements represent misclassified
compounds.
AAST-Comp eng 403/27/2014
41. Classification Techniques
Building accurate and efficient classifiers for
large databases is one of the essential tasks of
data mining and machine learning research
The ultimate reason for doing classification is to
increase understanding of the domain or to
improve predictions compared to unclassified
data.
3/27/2014AAST-Comp eng41
44. Support Vector Machine (SVM)
SVM is a state-of-the-art learning machine which has
been extensively used as a tool for data
classification , function approximation, etc.
due to its generalization ability and has found a
great deal of success in many applications.
Unlike traditional methods which minimizing the
empirical training error, a noteworthy feature of SVM
is that it minimize an upper bound of the
generalization error through maximizing the margin
between the separating hyper-plane and a data set
3/27/2014AAST-Comp eng44
45. Support Vector Machine (SVM)
3/27/2014AAST-Comp eng45
SVM is a state-of-the-art learning machine which has
been extensively used as a tool for data
classification , function approximation, etc.
due to its generalization ability and has found a
great deal of success in many applications.
Unlike traditional methods which minimizing the
empirical training error, a noteworthy feature of SVM
is that it minimize an upper bound of the
generalization error through maximizing the margin
between the separating hyper-plane and a data set
47. Linear classifiers: Which Hyperplane?
• Lots of possible solutions for a, b, c.
• Some methods find a separating hyperplane,
but not the optimal one
• Support Vector Machine (SVM) finds an
optimal solution.
– Maximizes the distance between the
hyperplane and the “difficult points” close to
decision boundary
– One intuition: if there are no points near the
decision surface, then there are no very
uncertain classification decisions
47
This line
represents the
decision
boundary:
ax + by − c = 0
Ch. 15
3/27/2014 AAST-Comp eng
48. Selection of a Good Hyper-Plane
Objective: Select a `good' hyper-plane using
only the data!
Intuition:
(Vapnik 1965) - assuming linear separability
(i) Separate the data
(ii) Place hyper-plane `far' from data
3/27/2014 AAST-Comp eng 48
49. SVM – Support Vector Machines
Support Vectors
Small Margin Large Margin
3/27/2014 AAST-Comp eng 49
50. Support Vector Machine (SVM)
• SVMs maximize the margin around
the separating hyperplane.
• The decision function is fully
specified by a subset of training
samples, the support vectors.
• Solving SVMs is a quadratic
programming problem
• Seen by many as the most
successful current text
classification method
50
Support vectors
Maximizes
margin
Sec. 15.1
Narrower
margin
3/27/2014 AAST-Comp eng
52. SVM
SVM
Relatively new concept
Nice Generalization properties
Hard to learn – learned in batch mode using
quadratic programming techniques
Using kernels can learn very complex functions
3/27/2014 AAST-Comp eng 53
54. K-Nearest Neighbor Classifier
Learning by analogy:
Tell me who your friends are and I’ll tell
you who you are
A new example is assigned to the most
common class among the (K) examples
that are most similar to it.
3/27/2014 AAST-Comp eng 55
55. K-Nearest Neighbor Algorithm
To determine the class of a new example E:
Calculate the distance between E and all examples in
the training set
Select K-nearest examples to E in the training set
Assign E to the most common class among its K-
nearest neighbors
Response
Response
No response
No response
No response
Class: Response
3/27/2014 AAST-Comp eng 56
56. Each example is represented with a set of numerical
attributes
―Closeness‖ is defined in terms of the Euclidean distance
between two examples.
The Euclidean distance between X=(x1, x2, x3,…xn) and Y
=(y1,y2, y3,…yn) is defined as:
Distance (John, Rachel)=sqrt [(35-41)2+(95K-215K)2 +(3-2)2]
n
i
ii yxYXD
1
2
)(),(
John:
Age=35
Income=95K
No. of credit cards=3
Rachel:
Age=41
Income=215K
No. of credit cards=2
Distance Between Neighbors
3/27/2014 AAST-Comp eng 57
57. Instance Based Learning
No model is built: Store all training examples
Any processing is delayed until a new instance must be
classified.
Response
Response No response
No response
No response
Class: Respond
3/27/2014 AAST-Comp eng 58
58. Example : 3-Nearest Neighbors
Customer Age Income No. credit
cards
Response
John 35 35K 3 No
Rachel 22 50K 2 Yes
Hannah 63 200K 1 No
Tom 59 170K 1 No
Nellie 25 40K 4 Yes
David 37 50K 2 ?
3/27/2014 AAST-Comp eng 59
59. Customer Age Income
(K)
No.
cards
John 35 35 3
Rachel 22 50 2
Hannah 63 200 1
Tom 59 170 1
Nellie 25 40 4
David 37 50 2
Response
No
Yes
No
No
Yes
Distance from David
sqrt [(35-37)2+(35-50)2
+(3-2)2]=15.16
sqrt [(22-37)2+(50-50)2
+(2-2)2]=15
sqrt [(63-37)2+(200-
50)2 +(1-2)2]=152.23
sqrt [(59-37)2+(170-
50)2 +(1-2)2]=122
sqrt [(25-37)2+(40-50)2
+(4-2)2]=15.74
Yes
3/27/2014 AAST-Comp eng 60
60. Strengths and Weaknesses
Strengths:
Simple to implement and use
Comprehensible – easy to explain prediction
Robust to noisy data by averaging k-nearest neighbors.
Weaknesses:
Need a lot of space to store all examples.
Takes more time to classify a new example than with a
model (need to calculate and compare distance from
new example to all other examples).
3/27/2014 AAST-Comp eng 61
62. – Decision tree induction is a simple but powerful
learning paradigm. In this method a set of training
examples is broken down into smaller and smaller
subsets while at the same time an associated decision
tree get incrementally developed. At the end of the
learning process, a decision tree covering the training
set is returned.
– The decision tree can be thought of as a set sentences
written propositional logic.
3/27/2014 AAST-Comp eng 63
63. Example
Jenny Lind is a writer of romance novels. A movie
company and a TV network both want exclusive
rights to one of her more popular works. If she signs
with the network, she will receive a single lump sum,
but if she signs with the movie company, the amount
she will receive depends on the market response to
her movie. What should she do?
3/27/2014 AAST-Comp eng 64
64. Payouts and Probabilities
• Movie company Payouts
– Small box office - $200,000
– Medium box office - $1,000,000
– Large box office - $3,000,000
• TV Network Payout
– Flat rate - $900,000
• Probabilities
– P(Small Box Office) = 0.3
– P(Medium Box Office) = 0.6
– P(Large Box Office) = 0.1
3/27/2014 AAST-Comp eng 65
65. Jenny Lind - Payoff Table
Decisions
States of Nature
Small Box
Office
Medium Box
Office
Large Box
Office
Sign with Movie
Company
$200,000 $1,000,000 $3,000,000
Sign with TV
Network
$900,000 $900,000 $900,000
Prior
Probabilities
0.3 0.6 0.1
3/27/2014 AAST-Comp eng 66
66. Using Expected Return Criteria
EVmovie=0.3(200,000)+0.6(1,000,000)+0.1(3,000,000)
= $960,000 = EVUII or EVBest
EVtv =0.3(900,000)+0.6(900,000)+0.1(900,000)
= $900,000
Therefore, using this criteria, Jenny should select the movie
contract.
3/27/2014 AAST-Comp eng 67
67. Decision Trees
• Three types of “nodes”
– Decision nodes - represented by squares ( )
– Chance nodes - represented by circles (Ο)
– Terminal nodes - represented by triangles (optional)
• Solving the tree involves pruning all but the best
decisions at decision nodes, and finding expected values
of all possible states of nature at chance nodes
• Create the tree from left to right
• Solve the tree from right to left
3/27/2014 AAST-Comp eng 68
69. Jenny Lind Decision Tree
Small Box Office
Medium Box Office
Large Box Office
Small Box Office
Medium Box Office
Large Box Office
Sign with Movie Co.
Sign with TV Network
$200,000
$1,000,000
$3,000,000
$900,000
$900,000
$900,000
3/27/2014 AAST-Comp eng 70
70. Jenny Lind Decision Tree
Small Box Office
Medium Box Office
Large Box Office
Small Box Office
Medium Box Office
Large Box Office
Sign with Movie Co.
Sign with TV Network
$200,000
$1,000,000
$3,000,000
$900,000
$900,000
$900,000
.3
.6
.1
.3
.6
.1
ER
?
ER
?
ER
?
3/27/2014 AAST-Comp eng 71
71. Jenny Lind Decision Tree - Solved
Small Box Office
Medium Box Office
Large Box Office
Small Box Office
Medium Box Office
Large Box Office
Sign with Movie Co.
Sign with TV Network
$200,000
$1,000,000
$3,000,000
$900,000
$900,000
$900,000
.3
.6
.1
.3
.6
.1
ER
900,000
ER
960,000
ER
960,000
3/27/2014 AAST-Comp eng 72
73. Evaluation Metrics
Predicted as healthy Predicted as unhealthy
Actual healthy tp fn
Actual not healthy fp
tn
AAST-Comp eng 743/27/2014
74. Cross-validation
• Correctly Classified Instances 143 95.3%
• Incorrectly Classified Instances 7 4.67 %
• Default 10-fold cross validation i.e.
– Split data into 10 equal sized pieces
– Train on 9 pieces and test on remainder
– Do for all possibilities and average
3/27/2014 AAST-Comp eng 75
75. A Novel Approach for Breast Cancer
Detection using Data Mining Techniques
76 3/27/2014AAST-Comp eng
76. Abstract
The aim of this paper is to investigate the
performance of different classification techniques.
Aim is developing accurate prediction models for
breast cancer using data mining techniques
Comparing three classification techniques in Weka
software and comparison results.
Sequential Minimal Optimization (SMO) has
higher prediction accuracy than IBK and BF Tree
methods.
77 3/27/2014AAST-Comp eng
77. Introduction
Breast cancer is on the rise across developing nations
due to the increase in life expectancy and lifestyle
changes such as women having fewer children.
Benign tumors:
• Are usually not harmful
• Rarely invade the tissues around them
• Don‘t spread to other parts of the body
• Can be removed and usually don‘t grow back
Malignant tumors:
• May be a threat to life
• Can invade nearby organs and tissues (such as the
chest wall)
• Can spread to other parts of the body
• Often can be removed but sometimes grow back
78 3/27/2014AAST-Comp eng
78. Risk factors
Gender
Age
Genetic risk factors
Family history
Personal history of breast cancer
Race : white or black
Dense breast tissue :denser breast tissue have a
higher risk
Certain benign (not cancer) breast problems
Lobular carcinoma in situ
Menstrual periods
79 3/27/2014AAST-Comp eng
79. Risk factors
Breast radiation early in life
Treatment with DES : the drug DES (diethylstilbestrol)
during pregnancy
Not having children or having them later in life
Certain kinds of birth control
Using hormone therapy after menopause
Not breastfeeding
Alcohol
Being overweight or obese
80 3/27/2014AAST-Comp eng
80. BACKGROUND
Bittern et al. used artificial neural network to
predict the survivability for breast cancer
patients. They tested their approach on a limited
data set, but their results show a good
agreement with actual survival Traditional
segmentation
Vikas Chaurasia et al. used Representive Tree, RBF
Network and Simple Logistic to predict the survivability
for breast cancer patients.
Liu Ya-Qin‘s experimented on breast cancer data using
C5 algorithm with bagging to predict breast cancer
survivability.
81 3/27/2014AAST-Comp eng
81. BACKGROUND
Bellaachi et al. used naive bayes, decision
tree and back-propagation neural network to
predict the survivability in breast cancer
patients. Although they reached good results
(about 90% accuracy), their results were not
significant due to the fact that they divided the
data set to two groups; one for the patients
who survived more than 5 years and the other
for those patients who died before 5 years.
Vikas Chaurasia et al. used Naive Bayes, J48
Decision Tree to predict the survivability for
Heart Diseases patients.
82 3/27/2014AAST-Comp eng
82. BACKGROUND
Vikas Chaurasia et al. used CART (Classification
and Regression Tree), ID3 (Iterative Dichotomized
3) and decision table (DT) to predict the
survivability for Heart Diseases patients.
Pan wen conducted experiments on ECG data to
identify abnormal high frequency
electrocardiograph using decision tree algorithm
C4.5.
Dong-Sheng Cao‘s proposed a new decision tree
based ensemble method combined with feature
selection method backward elimination strategy to
find the structure activity relationships in the area
of chemo metrics related to pharmaceutical
industry.83 3/27/2014AAST-Comp eng
83. BACKGROUND
Dr. S.Vijayarani et al., analyses the
performance of different classification function
techniques in data mining for predicting the
heart disease from the heart disease dataset.
The classification function algorithms is used
and tested in this work. The performance
factors used for analyzing the efficiency of
algorithms are clustering accuracy and error
rate. The result illustrates shows logistics
classification function efficiency is better than
multilayer perception and sequential minimal
optimization.84 3/27/2014AAST-Comp eng
84. BACKGROUND
Kaewchinporn C‘s presented a new
classification algorithm TBWC combination of
decision tree with bagging and clustering.
This algorithm is experimented on two
medical datasets: cardiocography1,
cardiocography2 and other datasets not
related to medical domain.
BS Harish et al., presented various text
representation schemes and compared
different classifiers used to classify text
documents to the predefined classes. The
existing methods are compared and
contrasted based on various parameters85 3/27/2014AAST-Comp eng
86. BREAST-CANCER-WISCONSIN DATA SET SUMMARY
the UC Irvine machine learning repository
Data from University of Wisconsin Hospital, Madison,
collected by dr. W.H. Wolberg.
2 classes (malignant and benign), and 9 integer-
valued attributes
breast-cancer-Wisconsin having 699 instances
We removed the 16 instances with missing values
from the dataset to construct a new dataset with 683
instances
Class distribution: Benign: 458 (65.5%) Malignant:
241 (34.5%)
Note :2 malignant and 14 benign excluded hence
percentage is wrong and the right one is :
benign 444 (65%) and malignant 239 (35%)
3/27/2014AAST-Comp eng87
87. 3/27/2014 AAST-Comp eng 88
Attribute Domain
Sample Code Number Id Number
Clump Thickness 1 - 10
Uniformity Of Cell Size 1 - 10
Uniformity Of Cell Shape 1 - 10
Marginal Adhesion 1 - 10
Single Epithelial Cell Size 1 - 10
Bare Nuclei 1 - 10
Bland Chromatin 1 - 10
Normal Nucleoli 1 - 10
Mitoses 1 - 10
Class 2 For Benign
4 For Malignant
88. EVALUATION METHODS
We have used the Weka (Waikato Environment for
Knowledge Analysis). version 3.6.9
WEKA is a collection of machine learning algorithms
for data mining tasks.
The algorithms can either be applied directly to a
dataset or called from your own Java code.
WEKA contains tools for data preprocessing,
classification, regression, clustering, association
rules, visualization and feature selection.
It is also well suited for developing new machine
learning schemes.
WEKA is open source software issued under the
GNU General Public License
3/27/2014AAST-Comp eng89
92. EXPERIMENTAL RESULTS
93 3/27/2014AAST-Comp eng
Evaluation Criteria Classifiers
BF TREE IBK SMO
Timing To Build Model (In
Sec)
0.97 0.02 0.33
Correctly Classified
Instances
652 655 657
Incorrectly Classified
Instances
31 28 26
Accuracy (%) 95.46% 95.90% 96.19%
93. EXPERIMENTAL RESULTS
The sensitivity or the true positive rate (TPR) is
defined by TP / (TP + FN)
the specificity or the true negative rate (TNR) is
defined by TN / (TN + FP)
the accuracy is defined by (TP + TN) / (TP + FP + TN
+ FN).
True positive (TP) = number of positive samples
correctly predicted.
False negative (FN) = number of positive samples
wrongly predicted.
False positive (FP) = number of negative samples
wrongly predicted as positive.
True negative (TN) = number of negative samples
correctly predicted
94 3/27/2014AAST-Comp eng
98. CONCLUSION.
the accuracy of classification techniques is evaluated
based on the selected classifier algorithm.
we used three popular data mining methods:
Sequential Minimal Optimization (SMO), IBK, BF
Tree.
The performance of SMO shows the high level
compare with other classifiers.
most important attributes for breast cancer survivals
are Uniformity of Cell Size.
99 3/27/2014AAST-Comp eng
99. Future work
using updated version of weka
Using another data mining tool
Using alternative algorithms and techniques
3/27/2014AAST-Comp eng100
100. Notes on paper
Spelling mistakes
No point of contact (e - mail)
Wrong percentage calculation
Copying from old papers
Charts not clear
No contributions
3/27/2014AAST-Comp eng101
101. comparison
Breast Cancer Diagnosis on Three Different
Datasets Using Multi-Classifiers written
International Journal of Computer and Information
Technology (2277 – 0764) Volume 01– Issue 01,
September 2012
Paper introduced more advanced idea and make a
fusion between classifiers
3/27/2014AAST-Comp eng102
102. References
103AAST-Comp eng
[1] U.S. Cancer Statistics Working Group. United States Cancer
Statistics: 1999–2008 Incidence and Mortality Web-based Report.
Atlanta (GA): Department of Health and Human Services, Centers for
Disease Control
[2] Lyon IAfRoC: World Cancer Report. International Agency for Research on
Cancer Press 2003:188-193.
[3] Elattar, Inas. “Breast Cancer: Magnitude of the Problem”,Egyptian Society
of Surgical Oncology Conference, Taba,Sinai, in Egypt (30 March – 1
April 2005).
[2] S. Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore (2011).
Knowledge based analysis of various statistical tools in detecting
breast cancer.
[3] Angeline Christobel. Y, Dr. Sivaprakasam (2011). An Empirical
Comparison of Data Mining Classification Methods. International
Journal of Computer Information Systems,Vol. 3, No. 2, 2011.
[4] D.Lavanya, Dr.K.Usha Rani,..,” Analysis of feature selection with
classification: Breast cancer datasets”,Indian Journal of Computer
Science and Engineering (IJCSE),October 2011.
3/27/2014
103. AAST-Comp eng 104
[5] E.Osuna, R.Freund, and F. Girosi, “Training support vector
machines:Application to face detection”. Proceedings of computer vision and
pattern recognition, Puerto Rico pp. 130–136.1997.
[6] Vaibhav Narayan Chunekar, Hemant P. Ambulgekar (2009).Approach of
Neural Network to Diagnose Breast Cancer on three different Data Set. 2009
International Conference on Advances in Recent Technologies in
Communication and Computing.
[7] D. Lavanya, “Ensemble Decision Tree Classifier for Breast Cancer Data,”
International Journal of Information Technology Convergence and Services,
vol. 2, no. 1, pp. 17-24, Feb. 2012.
[8] B.Ster, and A.Dobnikar, “Neural networks in medical diagnosis:
Comparison with other methods.” Proceedings of the international
conference on engineering applications of neural networks pp. 427–
430. 1996.
3/27/2014
104. [9] T.Joachims, Transductive inference for text classification using support
vector machines. Proceedings of international conference machine learning.
Slovenia. 1999.
[10] J.Abonyi, and F. Szeifert, “Supervised fuzzy clustering for the
identification of fuzzy classifiers.” Pattern Recognition Letters, vol.14(24),
2195–2207,2003.
[11] Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,School of
Information and Computer Science.
[12] William H. Wolberg, M.D., W. Nick Street, Ph.D., Dennis M. Heisey,
Ph.D., Olvi L. Mangasarian, Ph.D. computerized breast cancer diagnosis and
prognosis from fine needle aspirates, Western Surgical Association meeting in
Palm Desert, California, November 14, 1994.
AAST-Comp eng 1053/27/2014
105. AAST-Comp eng 106
[13] Street WN, Wolberg WH, Mangasarian OL. Nuclear feature extraction for
breast tumor diagnosis. Proceedings IS&T/ SPIE International Symposium on
Electronic Imaging 1993; 1905:861–70.
[14] Chen, Y., Abraham, A., Yang, B.(2006), Feature Selection and Classification
using Flexible Neural Tree. Journal of Neurocomputing 70(1-3): 305–313.
[15] J. Han and M. Kamber,”Data Mining Concepts and Techniques”,Morgan
Kauffman Publishers, 2000.
[16] Bishop, C.M.: “Neural Networks for Pattern Recognition”. Oxford
University Press,New York (1999).
[17] Vapnik, V.N., The Nature of Statistical Learning Theory, 1st ed.,Springer-
Verlag,New York, 1995.
[18] Ross Quinlan, (1993) C4.5: Programs for Machine Learning, Morgan
Kaufmann Publishers, San Mateo, CA.
185
3/27/2014