Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

ICLR 2020 Recap

153 Aufrufe

Veröffentlicht am

This presentation was made on June 9th, 2020.

Video recording of the session can be viewed here: https://youtu.be/OCB9sTUnUug


In this meetup with Sanyam Bhutani, Machine Learning Engineer at H2O.ai, he gives a recap of the eight annual ICLR (International Conference on Learning Representations) 2020 - a niche deep learning conference whose focus is to study how to learn representations of data, which is basically what deep learning does.

Sanyam goes through a few of his favorite selected papers from this year’s ICLR, note this session may not be able to capture the richness of all papers or allow a detailed discussion.

You will be able to find Sanyam in our community slack (https://www.h2o.ai/slack-community/), please feel free to start a discussion with him, if you send a  emoji greeting, you’ll find the answers.

Following are the papers we will look into:

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation


AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty


Your classifier is secretly an energy based model and you should treat it like one


ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators


ALBERT: A Lite BERT for Self-supervised Learning of Language Representations


Reformer: The Efficient Transformer


Generative Models for Effective ML on Private, Decentralized Datasets


Once for All: Train One Network and Specialize it for Efficient Deployment


Thieves on Sesame Street! Model Extraction of BERT-based APIs


Plug and Play Language Models: A Simple Approach to Controlled Text Generation


BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning


Real or Not Real, that is the Question

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

ICLR 2020 Recap

  1. 1. 1 ICLR 2020 Recap Selected Paper summaries and discussions Sanyam Bhutani ML Engineer & AI Content Creator bhutanisanyam1 🎙: ctdsshow
  2. 2. Democratizing AI Our mission to use AI for Good permeates into everything we do AI Transformation Bringing AI to industry by helping companies transform their businesses with H2O.ai. Trusted Partner AI4GOOD Bringing AI to impact by augmenting non-profits and social ventures with technological resources and capabilities. Impact/Social Open Source An industry leader in providing open source, cutting edge AI & ML platforms (H2O-3). Community
  3. 3. Confidential3 Founded in Silicon Valley 2012 Funding: $147M | Series D Investors: Goldman Sachs, Ping An, Wells Fargo, NVIDIA, Nexus Ventures We are Established We Make World-class AI Platforms We are Global H2O Open Source Machine Learning H2O Driverless AI: Automatic Machine Learning H2O Q: AI platform for business users Mountain View, NYC, London, Paris, Ottawa, Prague, Chennai, Singapore 220+ 1K 20K 180K Universities Companies Using H2O Open Source Meetup Members Experts H2O.ai Snapshot We are Passionate about Customers 4X customers, 2 years, all industries, all continents Aetna/CVS, Allergan, AT&T, CapitalOne, CBA, Citi, Coca Cola, Bredesco, Dish, Disney, Franklin Templeton, Genentech, Kaiser Permanente, Lego, Merck, Pepsi, Reckitt Benckiser, Roche
  4. 4. Confidential4 Our Team is Made up of the World’s Leading Data Scientists Your projects are backed by 10% of the World’s Data Science Grandmasters who are relentless in solving your critical problems.
  5. 5. Make Your Company an AI Company
  6. 6. ICLR 2020 What is ICLR?
  7. 7. 7 AGENDA • What is ICLR? • Paper Selection • 8 Paper Summaries • Q & A
  8. 8. Confidential8
  9. 9. 9 Paper Summaries • GAN related use cases • Deployment discussions • Adversarial attacks • Sesame Street (Transformers)
  10. 10. Confidential10 The Cutting edge of DL is about Engineering - Jeremy Howard
  11. 11. Confidential11 tive Attentional Networks with Adaptive Layer-Instance Normalization - Junho Kim et al
  12. 12. 12 • Image to Image Translation: - Selfie2Anime - Horse2Zebra - Dog2Cat - Photo2VanGough • Method for unsupervised image-to-image translation • Attention! (Attention is all you need) • Adaptive Layer- Instance Normalisation (AdaLIN) U-GAT-IT
  13. 13. 13 Architecture • Appreciating the problem • Attention! (Attention is all you need) • Adaptive Layer- Instance Normalisation (AdaLIN)
  14. 14. 14 • Using attention to guide different geometric transforms • Introduction of a new normalising function • Image 2 Image translation (And Backwards!) To Summarise
  15. 15. Confidential15 ix: A Simple Data processing method to improve robustness and unce - Dan Hendrycks et al
  16. 16. 16 • Why do you need image augmentations? • Test and Train split should be similar • Comparison of recent techniques • Why is AugMix promising? Image Augmentations
  17. 17. 17 How does it work?
  18. 18. 18 • Mixes augmented images and enforces consistent embeddings of the augmented images, which results in increased robustness and improved uncertainty calibration. • AutoAugment • AugMix does not require tuning to work correctly: enables plug-and-play data augmentation To Summarise
  19. 19. Confidential19 ELECTRA: Pre-Training Text Encoders as Discriminators rather than Generators - Kevin Clark et al
  20. 20. 20
  21. 21. 21 • Progress in NLP as a measure of GLUE score • What is GLUE Score? Pre-Training Progress
  22. 22. Confidential22
  23. 23. 23 • Progress in NLP as a measure of GLUE score • What is GLUE Score? • Normalised by Pre- Training FLOPs Pre-Training Progress
  24. 24. 24 • BERT family uses MLM • Suggested: A bi- directional model that learns from all of the tokens rather than some % masks Masked LM & ELECTRA
  25. 25. 25 • BERT family uses MLM • Suggested: A bi- directional model that learns from all of the tokens rather than some % masks Masked LM & ELECTRA
  26. 26. 26 • BERT family uses MLM • Suggested: A bi- directional model that learns from all of the tokens rather than some % masks Masked LM & ELECTRA ELECTRA Pre-Training outperforms MLM Pre-Training
  27. 27. 27 • Replacing token detection: a new self-supervised task for language representation learning. • Training a text encoder to distinguish input tokens from high-quality negative samples produced by an small generator network • It works well even when using relatively small amounts of compute • 45x/8x speedup over Train/Inference when compared to BERT- Base To Summarise
  28. 28. Confidential28 ALBERT: A Lite BERT for Language Understanding - Zhenzhong Lan et al
  29. 29. 29 • At some point further model increases become harder due to GPU/TPU memory limitations • Is having better NLP models as easy as having larger models? • How can we reduce Parameters? Introduction
  30. 30. 30 • Token Embeddings are sparsely populated -> Reduce size by projections • Re-Use Parameters of repeated operations Proposed Changes
  31. 31. 31 •Sentence Order Prediction for capturing inter-sentence coherence •Remove Dropout! •Adding more data increases performance Three More Tricks!
  32. 32. Confidential32 nce for All: Train One Network and Specialize it for Efficient Deploymen - Han Cai et al
  33. 33. 33 • Efficient Deployment of DL models across devices • Conventional approach: Train specialised Models: Think SqueezeNet, MobileNet,etc • Training Costs $$$, Engineering costs $$$ Introduction
  34. 34. 34 • Train Once, Specialise for deployment • Key Idea: Decouple model training from architectural search • Algorithm proposed: Progressive Shrinking Proposed Approach
  35. 35. 35 • Replacing token detection: a new self-supervised task for language representation learning. • Training a text encoder to distinguish input tokens from high-quality negative samples produced by an small generator network • It works well even when using relatively small amounts of compute • 45x/8x speedup over Train/Inference when compared to BERT- Base To Summarise
  36. 36. 36 • Replacing token detection: a new self-supervised task for language representation learning. • Training a text encoder to distinguish input tokens from high-quality negative samples produced by an small generator network • It works well even when using relatively small amounts of compute • 45x/8x speedup over Train/Inference when compared to BERT- Base To Summarise
  37. 37. Confidential37 Thieves on Sesame Street! Model Extraction of BERT-based APIs - Kalpesh Krishna et al
  38. 38. 38 • Random sentences to understand the model • After performing a large number of attacks, you have labels and dataset • Note: These are economically practical (Cheaper than trying to train a model) • Note 2: This is not model distillation, it’s IP Theft Attacks
  39. 39. Confidential39
  40. 40. 40 • Membership classification: Flagging queries • API Watermarking: Some % of queries are return a wrong output, “watermarked queries” and their outputs are stored on the API side. • Note: Both of these would fail against smart attacks Suggested Solutions
  41. 41. Confidential41 olling Text Generation with Plug and Play Language M - Rosanne Liu et al
  42. 42. Confidential42
  43. 43. 43 • LMs can generate coherent, relatable text, either from scratch or by completing a passage started by the user. • BUT, they are hard to steer or control. • Can also be triggered by certain adversarial attacks Introduction
  44. 44. 44 • Controlled generation: Adding knobs with conditional probability • Consists of 3 Steps: Controlling the Mammoth
  45. 45. 45 Controlling the Mammoth
  46. 46. 46 • Controlled generation: Adding knobs with conditional probability • Consists of 3 Steps • Also allows reduction in toxicity 63% to ~5%! Controlling the Mammoth
  47. 47. Confidential47 ENERATIVE MODELS FOR EFFECTIVE ML ON PRIVATE, DECENTRALIZED DATASET - Sean Augenstein et al
  48. 48. 48 • Modelling is important: Looking at data is a large part of the pipeline • Manual data inspection is problematic for privacy-sensitive dataset • Problem: Your model resides on your server, data on end devices Introduction
  49. 49. 49 • Modelling is important: Looking at data is a large part of the pipeline • Manual data inspection is problematic for privacy-sensitive dataset • Problem: Your model resides on your server, data on end devices Suggested Solutions
  50. 50. 50 • DP: Federated GANs: - Train on user device - Inspect generated data • Repository showcases: - Language Modelling with DP RNN - Image Modelling with DP GANs Suggested Solutions
  51. 51. Thank You! 🍵 bhutanisanyam1 🎙: ctdsshow
  52. 52. Questions?

×