SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Breast Cancer Classification based on Unsupervised Linear
Transformation along with Cos Similarity
Machine Learning
Dr. Ashwan A. Abdulmunem
8/2/2021
Introduction
- Breast cancer is one of the leading causes of mortality in women. Early detection and treatment are
imperative for improving survival rates.
- According to a recent report published by the American Cancer Society, breast cancer is the most prevalent
form of cancer in women, in the USA. In 2017 alone, studies indicate that approximately 252,000 new cases of
invasive breast cancer and 63,000 cases of in situ breast cancer are expected to be diagnosed, with 40,000
breast cancer-related deaths expected to occur [1]. Consequently, there is a real need for early diagnosis and
treatment, in order to reduce morbidity rates and improve patients’ quality of life.
1)DeSantis, C.E., Ma, J., Goding Sauer, A., Newman, L.A., Jemal, A.: Breast cancer statistics, 2017, racial
disparity in mortality by state. CA: a cancer journal for clinicians 67(6) (2017) 439–448
https://www.memorialplasticsurgery.com/breast-cancer-statistics-2017/
American Cancer Society Statistics of Breast Cancer
Breast Cancer: General Classification
Approaches
● Grade. Grading focuses on the appearance of the breast cancer cells compared to the appearance of normal
breast tissue. Normal cells in an organ like the breast become differentiated, meaning that they take on specific
shapes and forms that reflect their function as part of that organ. Pathologists describe cells as well differentiated
(low-grade), moderately differentiated (intermediate-grade), and poorly differentiated (high-grade) as the cells
progressively lose the features seen in normal breast cells.
● Stage. The TNM classification for staging breast cancer is based on the size of the cancer where it originally
started in the body and the locations to which it has travelled.
TNM stands for:
tumour
node
metastasis
● DNA-based classification. Understanding the specific details of a particular breast cancer may include looking
at the cancer cell DNA by several different laboratory approaches. When specific DNA mutations or gene
expression profiles are identified in the cancer cells this may guide the selection of treatments, either by targeting
these changes, or by predicting from these alterations which non-targeted therapies are most effective.
Artificial Intelligence
and
Breast Cancer classification
Proposed Method: Abstract
- Detection and classification of breast cancer at the cellular level is one of the most
challenging problems. Since the morphology and other cellular features of cancer
cells are different from normal healthy cells, it is possible to classify cancer cells
and normal cells using such features.
- The classical methods of segmentation and classification for malignant cells are not
only repetitive but also very time-consuming[2].
- Using PCA to select robust and informative features
[2]Khan, S.U., Islam, N., Jan, Z. et al. A machine learning-based approach for the segmentation and classification of malignant
cells in breast cytology images using gray level co-occurrence matrix (GLCM) and support vector machine (SVM). Neural
Comput & Applic (2020). https://doi.org/10.1007/s00521-021-05697-1
Normal Benign Malignant
Types of Cell
Steps of Proposed Breast Cancer Classification
http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Breast Cancer Dataset
Breast Cancer Dataset
o Number of instances 569
o ID number of patients
o Diagnosis (M = Malignant, B = Benign)
o 30 Features
Ten Real values
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for
each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
Breast Cancer Dataset
Dataset
Without PCA
With PCA
Experimental Procedures
Unsupervised Linear Transformation or Dimensionality
Reduction (PCA)
We propose to use a combination of PCA with Cos similarity algorithms to find best features of
Cancer dataset named PCA-Cos algorithm. Principal Component Analysis (PCA), is well
known for reduction dimensional and statistical measurements in big data manipulating.
PCA (cont.)
Sometimes we need to "compress" our data to speed up algorithms or to visualize data. One way is to use
dimensionality reduction which is the process of reducing the number of random variables under
consideration by obtaining a set of principal variables.
Two approaches:
Feature selection: find a subset of the input variables.
Feature projection (also Feature extraction): transforms the data in the high-dimensional space to a space
of fewer dimensions. PCA is one of the methods following this approach.
PCA (cont.)
mathematically" (precisely)? We need to know about:
• Mean: finds the most balanced point in the data.
• Variance: measures the spread of data from the mean.
• Covariance: indicates the direction in that data are spreading.
PCA Algorithm
1.Subtract the mean to move to the original axes.
2.From the original data (a lot of features x1,x2,…,xN​), we construct a covariance matrix U.
3.Find the eigenvalues λ1,λ2,… and correspondent eigenvectors v1,v2,… of that matrix (we call
them eigenstuffs). Choose K<N couples λ and v (the highest eigenvalues) and we get a reduced
matrix K<N​.
4.Projection original data points to the K-dimensional plane created based on these new
eigenstuffs. This step creates new data points on a new dimensional space (K).
5.Now, instead of solving the original problem (N features), we only need to solve a new problem
with K features (K<N).
Classification
Cosine Similarity :
•A measure of similarity between two non-zero vectors of an inner
product space
•The cosine of the trigonometric angle between two vectors
•The inner product of two vectors normalized to length 1
•Not a measure of vector magnitude, just the angle between vectors
Confusion Matrices
With PCA (99.12%) Without PCA (78.9%)
◼Based on the experiments we can conclude that, The Cos
similarity learning can work effectively along with PCA
algorithm. By using this combination, the results obviously
improved. The accuracy rate without PCA is 78.9% about 24
false negatives values from whole testing instances. While
when using PCA the accuracy increased to 99.12% give
more acceptable findings to justify this combination. As a
result, a machine learning with effective feature selection
give a reliable outcome in more vital problem in the health
community.
CONCLUSION
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
VishalLabde
 

Was ist angesagt? (20)

IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
 
Mansi_BreastCancerDetection
Mansi_BreastCancerDetectionMansi_BreastCancerDetection
Mansi_BreastCancerDetection
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)
 
Breast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkBreast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural Network
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
 
Brain tumor detection by scanning MRI images (using filtering techniques)
Brain tumor detection by scanning MRI images (using filtering techniques)Brain tumor detection by scanning MRI images (using filtering techniques)
Brain tumor detection by scanning MRI images (using filtering techniques)
 
Application of-image-segmentation-in-brain-tumor-detection
Application of-image-segmentation-in-brain-tumor-detectionApplication of-image-segmentation-in-brain-tumor-detection
Application of-image-segmentation-in-brain-tumor-detection
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
Breast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project ReportBreast cancerdetection IE594 Project Report
Breast cancerdetection IE594 Project Report
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine Learning
 
Breast cancer Detection using MATLAB
Breast cancer Detection using MATLABBreast cancer Detection using MATLAB
Breast cancer Detection using MATLAB
 
Breast cancer diagnosis and recurrence prediction using machine learning tech...
Breast cancer diagnosis and recurrence prediction using machine learning tech...Breast cancer diagnosis and recurrence prediction using machine learning tech...
Breast cancer diagnosis and recurrence prediction using machine learning tech...
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Brain Tumour Detection.pptx
Brain Tumour Detection.pptxBrain Tumour Detection.pptx
Brain Tumour Detection.pptx
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
 
Neural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR ImagesNeural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR Images
 
Brain Tumor Detection Using Image Processing
Brain Tumor Detection Using Image ProcessingBrain Tumor Detection Using Image Processing
Brain Tumor Detection Using Image Processing
 
Final ppt
Final pptFinal ppt
Final ppt
 

Ähnlich wie Breast cancer classification

Comparison of Image Segmentation Algorithms for Brain Tumor Detection
Comparison of Image Segmentation Algorithms for Brain Tumor DetectionComparison of Image Segmentation Algorithms for Brain Tumor Detection
Comparison of Image Segmentation Algorithms for Brain Tumor Detection
IJMTST Journal
 
Classification of mammograms based on features extraction techniques using su...
Classification of mammograms based on features extraction techniques using su...Classification of mammograms based on features extraction techniques using su...
Classification of mammograms based on features extraction techniques using su...
CSITiaesprime
 
Modified fuzzy rough set technique with stacked autoencoder model for magneti...
Modified fuzzy rough set technique with stacked autoencoder model for magneti...Modified fuzzy rough set technique with stacked autoencoder model for magneti...
Modified fuzzy rough set technique with stacked autoencoder model for magneti...
IJECEIAES
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
Elsa Fecke
 

Ähnlich wie Breast cancer classification (20)

Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
 
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
 
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
A Novel DBSCAN Approach to Identify Microcalcifications in Cancer Images with...
 
Comparison of Image Segmentation Algorithms for Brain Tumor Detection
Comparison of Image Segmentation Algorithms for Brain Tumor DetectionComparison of Image Segmentation Algorithms for Brain Tumor Detection
Comparison of Image Segmentation Algorithms for Brain Tumor Detection
 
Applying Deep Learning Techniques in Automated Analysis of CT scan images for...
Applying Deep Learning Techniques in Automated Analysis of CT scan images for...Applying Deep Learning Techniques in Automated Analysis of CT scan images for...
Applying Deep Learning Techniques in Automated Analysis of CT scan images for...
 
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisPerformance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
 
Mass segmentation
Mass segmentationMass segmentation
Mass segmentation
 
Detection of Breast Cancer using BPN Classifier in Mammograms
Detection of Breast Cancer using BPN Classifier in MammogramsDetection of Breast Cancer using BPN Classifier in Mammograms
Detection of Breast Cancer using BPN Classifier in Mammograms
 
Women in STEM
Women in STEM Women in STEM
Women in STEM
 
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
 
A Soft-Decision Approach for Microcalcification Mass Identification from Digi...
A Soft-Decision Approach for Microcalcification Mass Identification from Digi...A Soft-Decision Approach for Microcalcification Mass Identification from Digi...
A Soft-Decision Approach for Microcalcification Mass Identification from Digi...
 
An Image Segmentation and Classification for Brain Tumor Detection using Pill...
An Image Segmentation and Classification for Brain Tumor Detection using Pill...An Image Segmentation and Classification for Brain Tumor Detection using Pill...
An Image Segmentation and Classification for Brain Tumor Detection using Pill...
 
Intelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosisIntelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosis
 
Classification of mammograms based on features extraction techniques using su...
Classification of mammograms based on features extraction techniques using su...Classification of mammograms based on features extraction techniques using su...
Classification of mammograms based on features extraction techniques using su...
 
Az4102375381
Az4102375381Az4102375381
Az4102375381
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
 
My own Machine Learning project - Breast Cancer Prediction
My own Machine Learning project - Breast Cancer PredictionMy own Machine Learning project - Breast Cancer Prediction
My own Machine Learning project - Breast Cancer Prediction
 
Modified fuzzy rough set technique with stacked autoencoder model for magneti...
Modified fuzzy rough set technique with stacked autoencoder model for magneti...Modified fuzzy rough set technique with stacked autoencoder model for magneti...
Modified fuzzy rough set technique with stacked autoencoder model for magneti...
 
GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...
GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...
GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Breast cancer classification

  • 1. Breast Cancer Classification based on Unsupervised Linear Transformation along with Cos Similarity Machine Learning Dr. Ashwan A. Abdulmunem 8/2/2021
  • 2.
  • 3. Introduction - Breast cancer is one of the leading causes of mortality in women. Early detection and treatment are imperative for improving survival rates. - According to a recent report published by the American Cancer Society, breast cancer is the most prevalent form of cancer in women, in the USA. In 2017 alone, studies indicate that approximately 252,000 new cases of invasive breast cancer and 63,000 cases of in situ breast cancer are expected to be diagnosed, with 40,000 breast cancer-related deaths expected to occur [1]. Consequently, there is a real need for early diagnosis and treatment, in order to reduce morbidity rates and improve patients’ quality of life. 1)DeSantis, C.E., Ma, J., Goding Sauer, A., Newman, L.A., Jemal, A.: Breast cancer statistics, 2017, racial disparity in mortality by state. CA: a cancer journal for clinicians 67(6) (2017) 439–448 https://www.memorialplasticsurgery.com/breast-cancer-statistics-2017/
  • 4. American Cancer Society Statistics of Breast Cancer
  • 5. Breast Cancer: General Classification Approaches ● Grade. Grading focuses on the appearance of the breast cancer cells compared to the appearance of normal breast tissue. Normal cells in an organ like the breast become differentiated, meaning that they take on specific shapes and forms that reflect their function as part of that organ. Pathologists describe cells as well differentiated (low-grade), moderately differentiated (intermediate-grade), and poorly differentiated (high-grade) as the cells progressively lose the features seen in normal breast cells. ● Stage. The TNM classification for staging breast cancer is based on the size of the cancer where it originally started in the body and the locations to which it has travelled. TNM stands for: tumour node metastasis ● DNA-based classification. Understanding the specific details of a particular breast cancer may include looking at the cancer cell DNA by several different laboratory approaches. When specific DNA mutations or gene expression profiles are identified in the cancer cells this may guide the selection of treatments, either by targeting these changes, or by predicting from these alterations which non-targeted therapies are most effective.
  • 7. Proposed Method: Abstract - Detection and classification of breast cancer at the cellular level is one of the most challenging problems. Since the morphology and other cellular features of cancer cells are different from normal healthy cells, it is possible to classify cancer cells and normal cells using such features. - The classical methods of segmentation and classification for malignant cells are not only repetitive but also very time-consuming[2]. - Using PCA to select robust and informative features [2]Khan, S.U., Islam, N., Jan, Z. et al. A machine learning-based approach for the segmentation and classification of malignant cells in breast cytology images using gray level co-occurrence matrix (GLCM) and support vector machine (SVM). Neural Comput & Applic (2020). https://doi.org/10.1007/s00521-021-05697-1
  • 9. Steps of Proposed Breast Cancer Classification
  • 11. Breast Cancer Dataset o Number of instances 569 o ID number of patients o Diagnosis (M = Malignant, B = Benign) o 30 Features Ten Real values a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
  • 14. Unsupervised Linear Transformation or Dimensionality Reduction (PCA) We propose to use a combination of PCA with Cos similarity algorithms to find best features of Cancer dataset named PCA-Cos algorithm. Principal Component Analysis (PCA), is well known for reduction dimensional and statistical measurements in big data manipulating.
  • 15. PCA (cont.) Sometimes we need to "compress" our data to speed up algorithms or to visualize data. One way is to use dimensionality reduction which is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Two approaches: Feature selection: find a subset of the input variables. Feature projection (also Feature extraction): transforms the data in the high-dimensional space to a space of fewer dimensions. PCA is one of the methods following this approach.
  • 16. PCA (cont.) mathematically" (precisely)? We need to know about: • Mean: finds the most balanced point in the data. • Variance: measures the spread of data from the mean. • Covariance: indicates the direction in that data are spreading.
  • 17. PCA Algorithm 1.Subtract the mean to move to the original axes. 2.From the original data (a lot of features x1,x2,…,xN​), we construct a covariance matrix U. 3.Find the eigenvalues λ1,λ2,… and correspondent eigenvectors v1,v2,… of that matrix (we call them eigenstuffs). Choose K<N couples λ and v (the highest eigenvalues) and we get a reduced matrix K<N​. 4.Projection original data points to the K-dimensional plane created based on these new eigenstuffs. This step creates new data points on a new dimensional space (K). 5.Now, instead of solving the original problem (N features), we only need to solve a new problem with K features (K<N).
  • 18.
  • 20. Cosine Similarity : •A measure of similarity between two non-zero vectors of an inner product space •The cosine of the trigonometric angle between two vectors •The inner product of two vectors normalized to length 1 •Not a measure of vector magnitude, just the angle between vectors
  • 21.
  • 22.
  • 23. Confusion Matrices With PCA (99.12%) Without PCA (78.9%)
  • 24. ◼Based on the experiments we can conclude that, The Cos similarity learning can work effectively along with PCA algorithm. By using this combination, the results obviously improved. The accuracy rate without PCA is 78.9% about 24 false negatives values from whole testing instances. While when using PCA the accuracy increased to 99.12% give more acceptable findings to justify this combination. As a result, a machine learning with effective feature selection give a reliable outcome in more vital problem in the health community. CONCLUSION