SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Topic:
Machine learning           Synthetic minority over-
Imbalanced data sets   sampling technique (SMOTE)
                        Presented by Hector Franco
                                               TCD
Basic concepts

     Introduction
1.
     Recent developments
2.
     Algorithms description.
3.
     Evaluation.
4.
     Discursion.
5.
0
Multi class problems are imbalance when we

    compare one against all.
    In some cases the data set is very small, to

    generalize well.
    Text classification is an example of imbalanced

    data.
    It can be use with tree-kernels.

Effect of SMOTE and DEC – (SDC)




 After DEC   alone    After SMOTE
 and DEC
: Majority sample
: Minority sample
: Synthetic sample
                                         6
introduction
By convention the class with less number of

    examples is called minority or positive
    samples.
The recent developments in
imbalanced data sets learning
Between-class imbalanced.

    (where we focused on)
    Within-class imbalanced.



    It is important in text classification.

    We focused on the minority class, we want a

    high prediction for the minority class..
    Two class problem = multiclass problem .

NOT VERY GOOD
                         IN UNBALANCED
                              DATA




Popular evaluation for
 imbalance problem.
 Usually B=1, and =1
    in this paper
AUC:
TP rate
          AREA
          UNDER
          ROC


                  FP rate
Data level: Change the distribution

    ◦ make the data balanced
    Modify the existing data mining algorithms

    ◦ Make new algorithms
Random oversampling: duplicate

    Random under sampling: (can remove

    important data)
    Remove noise

    SMOTE

    Combine under sampling and over sampling.

    Find the hard examples and over sample

    them.
Adaboost (increase weights of misclassified),

    it does not perform well on imbalances ds. 
    Improve updated weights of TP & FP, better
    than weights of prediction based on TP & FP.
    Use a kernel of SVM

    Use a BMPM

    Biased Mini max Probability Machine.
    There are other cost-based learning…

A new Over-Sampling Method:
Borderline-SMOTE.
Algorithms usually

    try to learn the
    borderline, as
    exactly as possible.
Borderline-SMOTE1

    Borderline-SMOTE2

Also oversampling the majority class.

    The random numbers are between 0 and 0.5

    so the synthetic examples are more close to
    each other.
Experiments
Nothing: base line.

    SMOTE

    Random over-sampling

    Borderline-SMOTE1

    Borderline-SMOTE2



    K=5

    10 Fold cross validation.

    C4.5 classified

    We only want to improve the prediction of the

    minority class
conclusion
Is a common problem to work with

    imbalanced data sets.
    Borderline examples are more easy to

    misclassified.
    Our methods are better than traditional

    SMOTE.
    Open to research:

    ◦ how to define DANGER examples.
    ◦ Determination of number of examples in DANGER.
    ◦ Combine to data mining algorithms.
You are free:
•to copy, distribute, display, and perform the work
•to make derivative works

Under the following conditions:
•Attribution. You must give the original author credit.
What does quot;Attribute this workquot; mean?
The page you came from contained embedded licensing metadata, including how the
creator wishes to be attributed for re-use. You can use the HTML here to cite the work.
Doing so will also include metadata on your page so that others can find the original work
as well.

•Non-Commercial. You may not use this work for commercial purposes.
•For any reuse or distribution, you must make clear to others the licence terms of this
work.
•Any of these conditions can be waived if you get permission from the copyright holder.
•Nothing in this license impairs or restricts the author's moral rights.

Weitere ähnliche Inhalte

Was ist angesagt?

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsAndrew Ferlitsch
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Josh Patterson
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxShubham Jaybhaye
 

Was ist angesagt? (20)

Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
07 regularization
07 regularization07 regularization
07 regularization
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
 
Perceptron
PerceptronPerceptron
Perceptron
 

Andere mochten auch

Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
Predictive Modeling: Predict Premium Subscriber for a Leading International M...
Predictive Modeling: Predict Premium Subscriber for a Leading International M...Predictive Modeling: Predict Premium Subscriber for a Leading International M...
Predictive Modeling: Predict Premium Subscriber for a Leading International M...Kaushik Nuvvula
 
Ensemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and BeyondEnsemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and Beyondzukun
 
Présentation ardia pfe sabrine gharbi 2015 slide share
Présentation ardia pfe sabrine gharbi 2015 slide sharePrésentation ardia pfe sabrine gharbi 2015 slide share
Présentation ardia pfe sabrine gharbi 2015 slide sharegharbi sabrine
 
Tong quan ve phan cum data mining
Tong quan ve phan cum   data miningTong quan ve phan cum   data mining
Tong quan ve phan cum data miningHoa Chu
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...osify
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)Motoya Wakiyama
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類Shintaro Fukushima
 

Andere mochten auch (12)

Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Predictive Modeling: Predict Premium Subscriber for a Leading International M...
Predictive Modeling: Predict Premium Subscriber for a Leading International M...Predictive Modeling: Predict Premium Subscriber for a Leading International M...
Predictive Modeling: Predict Premium Subscriber for a Leading International M...
 
Ensemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and BeyondEnsemble of Exemplar-SVM for Object Detection and Beyond
Ensemble of Exemplar-SVM for Object Detection and Beyond
 
Présentation ardia pfe sabrine gharbi 2015 slide share
Présentation ardia pfe sabrine gharbi 2015 slide sharePrésentation ardia pfe sabrine gharbi 2015 slide share
Présentation ardia pfe sabrine gharbi 2015 slide share
 
Présentation pfe
Présentation pfePrésentation pfe
Présentation pfe
 
Tong quan ve phan cum data mining
Tong quan ve phan cum   data miningTong quan ve phan cum   data mining
Tong quan ve phan cum data mining
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類
 

Ähnlich wie Borderline Smote

Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
Simulating data to gain insights into power and p-hacking
Simulating data to gain insights intopower and p-hackingSimulating data to gain insights intopower and p-hacking
Simulating data to gain insights into power and p-hackingDorothy Bishop
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine LearningAyodele Odubela
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Query Linguistic Intent Detection
Query Linguistic Intent DetectionQuery Linguistic Intent Detection
Query Linguistic Intent Detectionbutest
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architecturesJoe li
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 

Ähnlich wie Borderline Smote (20)

Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Simulating data to gain insights into power and p-hacking
Simulating data to gain insights intopower and p-hackingSimulating data to gain insights intopower and p-hacking
Simulating data to gain insights into power and p-hacking
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
ICML2015 Slides
ICML2015 SlidesICML2015 Slides
ICML2015 Slides
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Query Linguistic Intent Detection
Query Linguistic Intent DetectionQuery Linguistic Intent Detection
Query Linguistic Intent Detection
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
 

Mehr von Trector Rancor

Cryptocurrencies overview
Cryptocurrencies overviewCryptocurrencies overview
Cryptocurrencies overviewTrector Rancor
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithmTrector Rancor
 
A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2Trector Rancor
 

Mehr von Trector Rancor (7)

Cryptocurrencies overview
Cryptocurrencies overviewCryptocurrencies overview
Cryptocurrencies overview
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithm
 
Virtual Journalist
Virtual JournalistVirtual Journalist
Virtual Journalist
 
Class Diagram Uml
Class Diagram UmlClass Diagram Uml
Class Diagram Uml
 
A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2
 
going to uni
going to unigoing to uni
going to uni
 
My First Presentation
My First PresentationMy First Presentation
My First Presentation
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Borderline Smote

  • 1. Topic: Machine learning Synthetic minority over- Imbalanced data sets sampling technique (SMOTE) Presented by Hector Franco TCD
  • 2. Basic concepts  Introduction 1. Recent developments 2. Algorithms description. 3. Evaluation. 4. Discursion. 5.
  • 3. 0
  • 4. Multi class problems are imbalance when we  compare one against all. In some cases the data set is very small, to  generalize well. Text classification is an example of imbalanced  data. It can be use with tree-kernels. 
  • 5. Effect of SMOTE and DEC – (SDC) After DEC alone After SMOTE and DEC
  • 6. : Majority sample : Minority sample : Synthetic sample 6
  • 7.
  • 9. By convention the class with less number of  examples is called minority or positive samples.
  • 10. The recent developments in imbalanced data sets learning
  • 11. Between-class imbalanced.  (where we focused on) Within-class imbalanced.  It is important in text classification.  We focused on the minority class, we want a  high prediction for the minority class.. Two class problem = multiclass problem . 
  • 12. NOT VERY GOOD IN UNBALANCED DATA Popular evaluation for imbalance problem. Usually B=1, and =1 in this paper
  • 13. AUC: TP rate AREA UNDER ROC FP rate
  • 14. Data level: Change the distribution  ◦ make the data balanced Modify the existing data mining algorithms  ◦ Make new algorithms
  • 15. Random oversampling: duplicate  Random under sampling: (can remove  important data) Remove noise  SMOTE  Combine under sampling and over sampling.  Find the hard examples and over sample  them.
  • 16. Adaboost (increase weights of misclassified),  it does not perform well on imbalances ds.  Improve updated weights of TP & FP, better than weights of prediction based on TP & FP. Use a kernel of SVM  Use a BMPM  Biased Mini max Probability Machine. There are other cost-based learning… 
  • 17. A new Over-Sampling Method: Borderline-SMOTE.
  • 18. Algorithms usually  try to learn the borderline, as exactly as possible.
  • 19. Borderline-SMOTE1  Borderline-SMOTE2 
  • 20.
  • 21.
  • 22. Also oversampling the majority class.  The random numbers are between 0 and 0.5  so the synthetic examples are more close to each other.
  • 23.
  • 24.
  • 25.
  • 27.
  • 28. Nothing: base line.  SMOTE  Random over-sampling  Borderline-SMOTE1  Borderline-SMOTE2  K=5  10 Fold cross validation.  C4.5 classified  We only want to improve the prediction of the  minority class
  • 29.
  • 30.
  • 31.
  • 32.
  • 34. Is a common problem to work with  imbalanced data sets. Borderline examples are more easy to  misclassified. Our methods are better than traditional  SMOTE. Open to research:  ◦ how to define DANGER examples. ◦ Determination of number of examples in DANGER. ◦ Combine to data mining algorithms.
  • 35.
  • 36. You are free: •to copy, distribute, display, and perform the work •to make derivative works Under the following conditions: •Attribution. You must give the original author credit. What does quot;Attribute this workquot; mean? The page you came from contained embedded licensing metadata, including how the creator wishes to be attributed for re-use. You can use the HTML here to cite the work. Doing so will also include metadata on your page so that others can find the original work as well. •Non-Commercial. You may not use this work for commercial purposes. •For any reuse or distribution, you must make clear to others the licence terms of this work. •Any of these conditions can be waived if you get permission from the copyright holder. •Nothing in this license impairs or restricts the author's moral rights.