SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Leveraging Machine Learning for
Breast Cancer Prediction Presented by : Arwa Marfatia
Introduction
• Machine Learning technologies has a wide range of potential uses
in healthcare from improving patient data, medical research,
diagnosis and treatment, to reducing costs and making patient
safety more efficient.
• Breast Cancer is considered one of the most common cancers in
women caused by various clinical, lifestyle, social and economic
factors.
• Machine learning, with its predictive capabilities, offers a
transformative approach to understanding and predicting breast
cancer in patients.
Through data-driven insights and predictive modeling, this presentation aims to showcase my
Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare
Sector.
Why Healthcare
Domain?
Machine learning provides an exciting opportunity in healthcare to improve the
accuracy of diagnoses, personalize healthcare, and find novel solutions to decades-
old problems.
Application of Machine Learning in Healthcare:
• Improve trauma-care response: By creating sensors and devices that can send a
patient’s vital information to the hospital before they arrive via ambulance or
other emergency transport, there is less time between when the patient arrives
and when they are able to receive life-saving treatment.
• Disease prediction: You can use machine learning to find trends, create
connections, and make conclusions based on large data sets. This can include
predicting disease outbreaks in communities and tracking habits leading to
patient disease.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualisations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously
unrecognisable symptom patterns and compare them with larger data sets to
diagnose diseases earlier in their development.
Project’s Significance and
its Benefits to Healthcare
• Early Diagnosis: Combining multiple risk factors in modeling for breast cancer
prediction could help the early diagnosis of the disease with necessary care plans.
• Collection, storage, and management: of different data and intelligent systems based
on multiple factors for predicting breast cancer are effective in disease management.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualizations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously unrecognisable
symptom patterns and compare them with larger data sets to diagnose diseases
earlier in their development.
Dataset
Information
Here are the key details about the dataset used in this project:
• Number of records: Our dataset comprises of a comparatively smaller collection
of data, consisting of 569 records. Each record represents a unique entry,
contributing to the richness and depth of our analysis.
• Features/Columns: The dataset is characterized by a diverse set of features.
Features are computed from a digitized image of a fine needle aspirate (FNA) of
a breast mass. They describe characteristics of the cell nuclei present in the
image. In total, there are 30 features/columns that form the basis of our
predictive modeling.
• Source of the Data: The dataset is sourced from Kaggle, ensuring reliability
and relevance. The data's origin plays a crucial role in shaping the context and
ensuring that our analysis is grounded in real-world scenarios and industry
dynamics.
Exploratory Data Analysis (EDA)
• Exploring the data allowed us to gain a comprehensive overview of the
data's structure. It uncovered potential patterns, helped us identify key
trends and get essential insights from the dataset.
• Throughout the EDA process, we analyzed the distribution of individual
features, investigated correlations, and explored any inherent
relationships between variables.
• Visualizations also played a crucial role in providing a clear
representation of the data, offering insights into breast cancer
prediction.
• First, we made sure there were no Null values and Duplicates in the dataset. There was only one
column with null values which was dropped since it only had null values. Our dataset was clean
to begin with.
• Then, we checked our columns to see if they were providing any useful information for us to
work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much
to the predictions. Hence, we decided to drop them during preprocessing.
• Some columns were highly correlated and could lead to overfitting and hence were dropped.
• To ensure consistent scales for numerical features, we decided to employ Standard Scaler
during preprocessing.
Exploratory Data Analysis (EDA)
Visualization
s
Our target variable ‘Diagnosis’ has 357 Benign
(Negative cases) and 212 Malignant (Positive
cases).
Upon inspecting the heatmap, we can see that there is multicollinearity observed among the
columns. As a result, some columns will be dropped.
Preprocessing
• First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful
information for our predictions.
• Since there is multicollinearity, columns with high correlations with other were
dropped.
• Then, we encoded the Categorical data into Numerical data with the help of Mapping.
It assigns binary numeric values to each unique class present in column with
categorical data.
Splitting the data into X and
y• In this step, we partitioned the dataset into two components: X and y.
• The variable X encompasses all independent variables, representing the features
that contribute to our predictions.
• On the other hand, y encapsulates the dependent variable or target variable,
serving as the outcome we aim to predict.
Train-Test Split
• We then split the dataset into training data and testing data.
• We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is
Testing Data. So, our test size was set to 0.2.
• We took Random State as 40. This guaranteed the reproducibility of our results across
different runs.
• We also used Stratify = y to ensure that our Target Variable (y) is distributed
proportionally.
Standard Scaler
• We used Standard Scaler to standardize the features of the dataset.
• This ensured that the consistency between the features of the dataset was maintained.
• Standardization is crucial for certain machine learning algorithms, promoting optimal
model performance by mitigating the influence of varying magnitudes among features.
Applying Machine
Learning Algorithms
The, Breast Cancer Prediction problem, is a Binary Classification problem.
Models used:
• Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at
modeling the probability of an event occurring, making it suitable for scenarios where understanding the
likelihood of breast cancer cells is essential.
• Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
• Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input
features.
• Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for
classification tasks, especially when there's a need for clear separation between classes.
• Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency.
It assumes that features are independent, making calculations easier. Its often used when simplicity and
speed are crucial.
Model Selection and Considerations
• SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in
all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to
be a promising model for our task.
• Based on the provided metrics, SVC stands out as the best-performing model overall. It
achieves a good balance between precision and recall, making it suitable for our Breast
Cancer prediction task.
• Hence, we will go with Support Vector Classification as our final model as it is quite
evident that it performs best for our Breast Cancer problem.
Conclusion
• With the help of several insights, patterns and trends in our data, we’ve used Machine
Learning to address the intricate challenge of predicting Breast Cancer.
• This project offers significant benefits to banks:
 Combining multiple risk factors in modeling for breast cancer prediction could help
the early diagnosis of the disease with necessary care plans.
 Collection, storage, and management of different data and intelligent systems
based on multiple factors for predicting breast cancer are effective in disease
management.
 The proposed machine-learning approaches could predict breast cancer as the
early detection of this disease could help slow down the progress of the disease and
reduce the mortality rate through appropriate therapeutic interventions at the
right time.
Thank You !

Weitere ähnliche Inhalte

Ähnlich wie Breast Cancer Prediction - Arwa Marfatia.pptx

heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptxrakshashadu
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET Journal
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorNagavelliMadhavi
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxMaligireddyTanujaRed1
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningIRJET Journal
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer DetectionIRJET Journal
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer PredictionIRJET Journal
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningIRJET Journal
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsLovely Professional University
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhmithun302002
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Michael Batavia
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET Journal
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET Journal
 

Ähnlich wie Breast Cancer Prediction - Arwa Marfatia.pptx (20)

heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptx
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumor
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine Learning
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer Detection
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer Data
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer Prediction
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine Learning
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree Algorithms
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
 

Mehr von Boston Institute of Analytics

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Boston Institute of Analytics
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceBoston Institute of Analytics
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBoston Institute of Analytics
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsBoston Institute of Analytics
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 

Mehr von Boston Institute of Analytics (20)

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Kürzlich hochgeladen

Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111GangaMaiya1
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfElizabeth Walsh
 
Ernest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell TollsErnest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell TollsPallavi Parmar
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17Celine George
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxMichaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxRugvedSathawane
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfJerry Chew
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaEADTU
 
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfPUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfMinawBelay
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 

Kürzlich hochgeladen (20)

Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdf
 
Ernest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell TollsErnest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell Tolls
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxMichaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfPUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 

Breast Cancer Prediction - Arwa Marfatia.pptx

  • 1.
  • 2. Leveraging Machine Learning for Breast Cancer Prediction Presented by : Arwa Marfatia
  • 3. Introduction • Machine Learning technologies has a wide range of potential uses in healthcare from improving patient data, medical research, diagnosis and treatment, to reducing costs and making patient safety more efficient. • Breast Cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social and economic factors. • Machine learning, with its predictive capabilities, offers a transformative approach to understanding and predicting breast cancer in patients. Through data-driven insights and predictive modeling, this presentation aims to showcase my Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare Sector.
  • 4. Why Healthcare Domain? Machine learning provides an exciting opportunity in healthcare to improve the accuracy of diagnoses, personalize healthcare, and find novel solutions to decades- old problems. Application of Machine Learning in Healthcare: • Improve trauma-care response: By creating sensors and devices that can send a patient’s vital information to the hospital before they arrive via ambulance or other emergency transport, there is less time between when the patient arrives and when they are able to receive life-saving treatment. • Disease prediction: You can use machine learning to find trends, create connections, and make conclusions based on large data sets. This can include predicting disease outbreaks in communities and tracking habits leading to patient disease. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualisations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 5. Project’s Significance and its Benefits to Healthcare • Early Diagnosis: Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. • Collection, storage, and management: of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualizations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 6. Dataset Information Here are the key details about the dataset used in this project: • Number of records: Our dataset comprises of a comparatively smaller collection of data, consisting of 569 records. Each record represents a unique entry, contributing to the richness and depth of our analysis. • Features/Columns: The dataset is characterized by a diverse set of features. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. In total, there are 30 features/columns that form the basis of our predictive modeling. • Source of the Data: The dataset is sourced from Kaggle, ensuring reliability and relevance. The data's origin plays a crucial role in shaping the context and ensuring that our analysis is grounded in real-world scenarios and industry dynamics.
  • 7. Exploratory Data Analysis (EDA) • Exploring the data allowed us to gain a comprehensive overview of the data's structure. It uncovered potential patterns, helped us identify key trends and get essential insights from the dataset. • Throughout the EDA process, we analyzed the distribution of individual features, investigated correlations, and explored any inherent relationships between variables. • Visualizations also played a crucial role in providing a clear representation of the data, offering insights into breast cancer prediction.
  • 8. • First, we made sure there were no Null values and Duplicates in the dataset. There was only one column with null values which was dropped since it only had null values. Our dataset was clean to begin with. • Then, we checked our columns to see if they were providing any useful information for us to work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much to the predictions. Hence, we decided to drop them during preprocessing. • Some columns were highly correlated and could lead to overfitting and hence were dropped. • To ensure consistent scales for numerical features, we decided to employ Standard Scaler during preprocessing. Exploratory Data Analysis (EDA)
  • 9. Visualization s Our target variable ‘Diagnosis’ has 357 Benign (Negative cases) and 212 Malignant (Positive cases).
  • 10. Upon inspecting the heatmap, we can see that there is multicollinearity observed among the columns. As a result, some columns will be dropped.
  • 11. Preprocessing • First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful information for our predictions. • Since there is multicollinearity, columns with high correlations with other were dropped. • Then, we encoded the Categorical data into Numerical data with the help of Mapping. It assigns binary numeric values to each unique class present in column with categorical data. Splitting the data into X and y• In this step, we partitioned the dataset into two components: X and y. • The variable X encompasses all independent variables, representing the features that contribute to our predictions. • On the other hand, y encapsulates the dependent variable or target variable, serving as the outcome we aim to predict.
  • 12. Train-Test Split • We then split the dataset into training data and testing data. • We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is Testing Data. So, our test size was set to 0.2. • We took Random State as 40. This guaranteed the reproducibility of our results across different runs. • We also used Stratify = y to ensure that our Target Variable (y) is distributed proportionally. Standard Scaler • We used Standard Scaler to standardize the features of the dataset. • This ensured that the consistency between the features of the dataset was maintained. • Standardization is crucial for certain machine learning algorithms, promoting optimal model performance by mitigating the influence of varying magnitudes among features.
  • 13. Applying Machine Learning Algorithms The, Breast Cancer Prediction problem, is a Binary Classification problem. Models used: • Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at modeling the probability of an event occurring, making it suitable for scenarios where understanding the likelihood of breast cancer cells is essential. • Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. • Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input features. • Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for classification tasks, especially when there's a need for clear separation between classes. • Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency. It assumes that features are independent, making calculations easier. Its often used when simplicity and speed are crucial.
  • 14. Model Selection and Considerations • SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to be a promising model for our task. • Based on the provided metrics, SVC stands out as the best-performing model overall. It achieves a good balance between precision and recall, making it suitable for our Breast Cancer prediction task. • Hence, we will go with Support Vector Classification as our final model as it is quite evident that it performs best for our Breast Cancer problem.
  • 15. Conclusion • With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to address the intricate challenge of predicting Breast Cancer. • This project offers significant benefits to banks:  Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans.  Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.  The proposed machine-learning approaches could predict breast cancer as the early detection of this disease could help slow down the progress of the disease and reduce the mortality rate through appropriate therapeutic interventions at the right time.