SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Semi-supervised learning with GANs : putting together the pieces together
OLX Berlin Data Science Team
Classifieds
OLX in a glance
350+ million
active users
40+ countries
60+ million new
listings per month
Market leader
in 35 countries
Classifieds - Data Science areas of Application
How GANs help in a semi-supervised setup
Using Colaboratory to train a GAN for semi-supervised learning
Options for putting the model in production
What are GANs
Outline
Short Intro
Intro to Deep Learning
Convolution Neural Networks in two images - part 1
Source : https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html
Convolution and Pooling
Source : https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html
Convolution Neural Networks in two images - part 2
http://machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html
Why does it work so well?
Hierarchical building block hypothesis
Source : https://medium.com/@johnsmart/your-personal-sim-pt-4-deep-agents-understanding-natural-intelligence-7040ae074b71
Sample Sizes for Deep Learning
Source : https://arxiv.org/pdf/1602.07332.pdf
Generative Adversarial Networks
Conceptual representation of a GAN
Source : https://stats.stackexchange.com/questions/277756/some-general-questions-on-generative-adversarial-networks
1D example of GAN
Source : https://ireneli.eu/2016/11/17/deep-learning-13-understanding-generative-adversarial-network/
GANs in one image
Source : https://news.developer.nvidia.com/edit-photos-with-gans
Types of GANs
Source : https://becominghuman.ai/unsupervised-image-to-image-translation-with-generative-adversarial-networks-adb3259b11b9
Limitations of GANs (1/3)
● Mode Collapse: Generator does not generate the full range of plausible samples
● More likely if the generator is optimized while the discriminator is kept constant for many iterations
Limitations of GANs (2/3)
● GANs predict the entire sample (e.g. image) at once
● Difficult to predict pixel neighborhoods
Limitations of GANs (3/3)
● Difficult training process / convergence: Generator and discriminator keep oscillating
● Network does not converge to a stable, (near) optimal solution
Semi-supervised learning with
GANs
Why using GANs in a semi-supervised setting?
● Create a more diverse set of unlabeled data
● Get a better / more descriptive decision boundary
● Improve generalization when the amount of training samples is small
Influence of unlabeled data in semi-supervised learning
● Continuity assumption
● Cluster assumption
● Manifold assumption
Example of GAN in a semi-supervised learning role
Source : https://towardsdatascience.com/semi-supervised-learning-with-gans-9f3cb128c5e
Sources of information and their role for the discriminator
● Real samples that have labels, similar to normal supervised learning
● Real samples that do not have labels. For those, the discriminator only learns that these images are
real
● Samples coming from the generator. For those, the discriminator learns to classify as fake
What do we have to modify to make it work?
● Adjust the loss functions so that we tackle both problems:
○ as per the original GAN task, use binary cross entropy for the GAN part
○ as per multiclass classification task, use softmax cross entropy (or sparse cross entropy with
logits) for the supervised task
○ apply masks to ignore the label not seen in the respective subtask
○ take the average of the GAN and supervised loss
● The discriminator can use the generator’s images, as well as the labeled and unlabeled training data,
to classify the dataset
Improving the GAN part
● Training GANs is a challenging problem -> feature matching approach
a. take the moments for some features for a “real” minibatch
b. take the moments for same features for a “fake” minibatch
c. minimize mean absolute error between the two
A use case from
Trust and Safety
Classifieds - Data Science areas of Application
Classifieds - Data Science areas of Application
● Spam can be a problem in open platforms like classifieds
● Its annoying and can also be used as a vehicle for fraud
● How can we keep bad users out without annoying good users with stringent policies?
Why is it a challenge?
● It’s a recurring battle, with spammers becoming more inventive to circumvent the algorithms
● We may have only a few cases of cases of annotated examples for certain types of bad content
Generating Synthetic Samples with GANs
Results
● In the presented example we only have a few tens of instances for a certain type of spam
● We are able to quickly explore the input space and map to a latent feature space combining the
different types of signal we have available from text, image, user behaviour, etc.
● Semi supervised learning with GANs does better (AUC = 0.999) vs Random Forest (AUC = 0.997)
Questions and Discussion
Deployment Options we will discuss
● Amazon Sagemaker
● AWS Lambda and Tensorflow
AWS Sagemaker in the ML Stack
AWS SageMaker
AWS Lambda
39
Applied Data Science Vision - Building the BasisAWS Lambda with Tensorflow
Resources
● Using Chalice to serve SageMaker predictions
○ https://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033
● SageMaker examples
○ https://github.com/awslabs/amazon-sagemaker-examples
● How to Deploy Deep Learning Models with AWS Lambda and Tensorflow
○ https://aws.amazon.com/blogs/machine-learning/how-to-deploy-deep-learning-models-with-aws-la
mbda-and-tensorflow/
References
● Getting started with ML (Google)
○ https://drive.google.com/file/d/0B0phzgXZajPFZFRYNFRicjlZTVk/view
● Semi-supervised learning with Generative Adversarial Networks (GANs)
○ https://towardsdatascience.com/semi-supervised-learning-with-gans-9f3cb128c5e
● CNNs, three things you need to know
○ https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html
● Semi-supervised learning
○ https://en.wikipedia.org/wiki/Semi-supervised_learning
● Variants of GANs
○ https://www.slideshare.net/thinkingfactory/variants-of-gans-jaejun-yoo
● From GAN to WGAN
○ https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

Weitere ähnliche Inhalte

Ähnlich wie Predictive analytics semi-supervised learning with GANs

GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
ManiMaran230751
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Codiax
 

Ähnlich wie Predictive analytics semi-supervised learning with GANs (20)

GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Python Machine Learning - Getting Started
Python Machine Learning - Getting StartedPython Machine Learning - Getting Started
Python Machine Learning - Getting Started
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
C3 w5
C3 w5C3 w5
C3 w5
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdf
 
Nips 2016 tutorial generative adversarial networks review
Nips 2016 tutorial  generative adversarial networks reviewNips 2016 tutorial  generative adversarial networks review
Nips 2016 tutorial generative adversarial networks review
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
 
Project report
Project reportProject report
Project report
 

Kürzlich hochgeladen

Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 

Kürzlich hochgeladen (15)

Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 

Predictive analytics semi-supervised learning with GANs

  • 1. Semi-supervised learning with GANs : putting together the pieces together OLX Berlin Data Science Team
  • 3. OLX in a glance 350+ million active users 40+ countries 60+ million new listings per month Market leader in 35 countries
  • 4. Classifieds - Data Science areas of Application
  • 5. How GANs help in a semi-supervised setup Using Colaboratory to train a GAN for semi-supervised learning Options for putting the model in production What are GANs Outline Short Intro
  • 6. Intro to Deep Learning
  • 7. Convolution Neural Networks in two images - part 1 Source : https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html
  • 8. Convolution and Pooling Source : https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html
  • 9. Convolution Neural Networks in two images - part 2 http://machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html
  • 10. Why does it work so well?
  • 11. Hierarchical building block hypothesis Source : https://medium.com/@johnsmart/your-personal-sim-pt-4-deep-agents-understanding-natural-intelligence-7040ae074b71
  • 12. Sample Sizes for Deep Learning Source : https://arxiv.org/pdf/1602.07332.pdf
  • 14. Conceptual representation of a GAN Source : https://stats.stackexchange.com/questions/277756/some-general-questions-on-generative-adversarial-networks
  • 15. 1D example of GAN Source : https://ireneli.eu/2016/11/17/deep-learning-13-understanding-generative-adversarial-network/
  • 16. GANs in one image Source : https://news.developer.nvidia.com/edit-photos-with-gans
  • 17. Types of GANs Source : https://becominghuman.ai/unsupervised-image-to-image-translation-with-generative-adversarial-networks-adb3259b11b9
  • 18. Limitations of GANs (1/3) ● Mode Collapse: Generator does not generate the full range of plausible samples ● More likely if the generator is optimized while the discriminator is kept constant for many iterations
  • 19. Limitations of GANs (2/3) ● GANs predict the entire sample (e.g. image) at once ● Difficult to predict pixel neighborhoods
  • 20. Limitations of GANs (3/3) ● Difficult training process / convergence: Generator and discriminator keep oscillating ● Network does not converge to a stable, (near) optimal solution
  • 22. Why using GANs in a semi-supervised setting? ● Create a more diverse set of unlabeled data ● Get a better / more descriptive decision boundary ● Improve generalization when the amount of training samples is small
  • 23. Influence of unlabeled data in semi-supervised learning ● Continuity assumption ● Cluster assumption ● Manifold assumption
  • 24. Example of GAN in a semi-supervised learning role Source : https://towardsdatascience.com/semi-supervised-learning-with-gans-9f3cb128c5e
  • 25. Sources of information and their role for the discriminator ● Real samples that have labels, similar to normal supervised learning ● Real samples that do not have labels. For those, the discriminator only learns that these images are real ● Samples coming from the generator. For those, the discriminator learns to classify as fake
  • 26. What do we have to modify to make it work? ● Adjust the loss functions so that we tackle both problems: ○ as per the original GAN task, use binary cross entropy for the GAN part ○ as per multiclass classification task, use softmax cross entropy (or sparse cross entropy with logits) for the supervised task ○ apply masks to ignore the label not seen in the respective subtask ○ take the average of the GAN and supervised loss ● The discriminator can use the generator’s images, as well as the labeled and unlabeled training data, to classify the dataset
  • 27. Improving the GAN part ● Training GANs is a challenging problem -> feature matching approach a. take the moments for some features for a “real” minibatch b. take the moments for same features for a “fake” minibatch c. minimize mean absolute error between the two
  • 28. A use case from Trust and Safety
  • 29. Classifieds - Data Science areas of Application
  • 30. Classifieds - Data Science areas of Application ● Spam can be a problem in open platforms like classifieds ● Its annoying and can also be used as a vehicle for fraud ● How can we keep bad users out without annoying good users with stringent policies?
  • 31. Why is it a challenge? ● It’s a recurring battle, with spammers becoming more inventive to circumvent the algorithms ● We may have only a few cases of cases of annotated examples for certain types of bad content
  • 33. Results ● In the presented example we only have a few tens of instances for a certain type of spam ● We are able to quickly explore the input space and map to a latent feature space combining the different types of signal we have available from text, image, user behaviour, etc. ● Semi supervised learning with GANs does better (AUC = 0.999) vs Random Forest (AUC = 0.997)
  • 35. Deployment Options we will discuss ● Amazon Sagemaker ● AWS Lambda and Tensorflow
  • 36. AWS Sagemaker in the ML Stack
  • 39. 39 Applied Data Science Vision - Building the BasisAWS Lambda with Tensorflow
  • 40. Resources ● Using Chalice to serve SageMaker predictions ○ https://medium.com/@julsimon/using-chalice-to-serve-sagemaker-predictions-a2015c02b033 ● SageMaker examples ○ https://github.com/awslabs/amazon-sagemaker-examples ● How to Deploy Deep Learning Models with AWS Lambda and Tensorflow ○ https://aws.amazon.com/blogs/machine-learning/how-to-deploy-deep-learning-models-with-aws-la mbda-and-tensorflow/
  • 41. References ● Getting started with ML (Google) ○ https://drive.google.com/file/d/0B0phzgXZajPFZFRYNFRicjlZTVk/view ● Semi-supervised learning with Generative Adversarial Networks (GANs) ○ https://towardsdatascience.com/semi-supervised-learning-with-gans-9f3cb128c5e ● CNNs, three things you need to know ○ https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html ● Semi-supervised learning ○ https://en.wikipedia.org/wiki/Semi-supervised_learning ● Variants of GANs ○ https://www.slideshare.net/thinkingfactory/variants-of-gans-jaejun-yoo ● From GAN to WGAN ○ https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html