SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Zero-shot learning capabilities
of CLIP model from
Yurii Pashchenko AI&BigData Online Day 2021
Yurii Pashchenko
Sr ML Engineer at Depositphotos
About me
❏ Yurii Pashchenko
❏ Sr Machine Learning Engineer at Depositphotos
❏ Over 8 years of research and commercial experience in
applying Deep Learning models
❏ Object Detection Specialist
❏ Knowledge Sharing Master at Transformer* at least I want to
become
��
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
What is Zero-Shot Learning
Understanding Zero-Shot Learning — Making ML More Human
Motivation of CLIP from OpenAI?
● Costly datasets
● Narrow
● Poor real-world performance
CLIP: Connecting Text and Images
CLIP: Contrastive Language-Image
Pre-training
Learning Transferable Visual Models From Natural Language Supervision
● 400 million (image, text) pairs collected
from Internet.
● Trained modifications of ResNet-50
and ViT-B
● Batch size 32 768 for 32 epochs
● The largest ResNet model, RN50x64,
took 18 days to train on 592 V100
GPUs while the largest Vision
Transformer took 12 days on 256
V100 GPUs
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
CLIP for Zero-Shot Classification
Learning Transferable Visual Models From Natural Language Supervision
Ensembling around 80
prompts improve
ImageNet accuracy by
almost 5%
CLIP Zero-Shot visual results
CLIP: Connecting Text and Images
CLIP Zero-Shot generalization
Learning Transferable Visual Models From Natural Language Supervision
CLIP Zero-Shot vs Few-Shot
Learning Transferable Visual Models From Natural Language Supervision
CLIP on FairFace
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and
Mitigation
CLIP has a top-1 accuracy of 59.2% for “in the
wild” celebrity image classification when
choosing from 100 candidates and a top-1
accuracy of 43.3% when choosing from 1000
possible choices
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
CLIP for Image Ranking
DALL·E: Creating Images from Text
“an armchair in the shape of an avocado”
“a living room with two white armchairs and a painting of the
collosseum. the painting is mounted above a modern fireplace”
CLIP for Image Search
Text-to-Image
Unsplash Image Search
CLIP for Image Search
Image-to-Image
Unsplash Image Search
CLIP for Image Search
Text+Text-to-Image
Unsplash Image Search
CLIP for Image Search
Image+Text-to-Image
Unsplash Image Search
+
“cars”
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
CLIP limitations
Learning Transferable Visual Models From Natural Language Supervision
● poor generalization to images not covered
in its pre-training dataset (MNIST)
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
an elephant a zebra a lake
Text:
examples from this collab
CLIP limitations
CLIP limitations
Learning Transferable Visual Models From Natural Language Supervision
● poor generalization to images not covered
in its pre-training dataset (MNIST)
● counting the number of objects in an image
● predicting how close the nearest object is in
a photo
● CLIP’s zero-shot classifiers can be sensitive
to wording or phrasing and sometimes
require trial and error “prompt engineering”
to perform well.
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
You can’t just make an Object Detector
from a Classifier
… without fine-tuning
Assembling Object Detector with CLIP
Rich feature hierarchies for accurate object detection and semantic segmentation
CLIP
Text
Encoder
person
Region proposals alternatives
Salient Object Detection Techniques in Computer Vision—A Survey
Salient object detection (SOD) is an important computer vision task aimed at precise
detection and segmentation of visually distinctive image regions from the perspective of the
human visual system
Region proposals alternatives
Open-World Entity Segmentation
Entity Segmentation is a segmentation task with the aim to segment everything in an image
into semantically-meaningful regions without considering any category labels.
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
What is knowledge distillation?
Knowledge Distillation : Simplified
Knowledge distillation refers to the idea of model compression by teaching a smaller network,
step by step, exactly what to do using a bigger already trained network.
Mask R-CNN
- Why?
- Class-agnostic bbox regression and mask prediction
Mask R-CNN
Vision and Language knowledge Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
VILD results
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
VILD generalization ability
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
VILD visualizations
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
VILD visualizations
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Zero-Shot object tracking
Introducing Zero Shot Object Tracking
Zero-shot learning capabilities of CLIP model
from OpenAI
❏ Short intro to Zero-Shot Learning and CLIP from OpenAI
❏ Zero-Shot Classification based on CLIP
❏ CLIP for image ranking & search
❏ Limitations of CLIP model
❏ Object Detection/Segmentation
❏ Knowledge distillation
❏ GANs + CLIP
VQGAN + CLIP
The Illustrated VQGAN
VQGAN + CLIP
https://github.com/nerdyrodent/VQGAN-CLIP
"A painting of an apple in a fruit bowl | psychedelic | surreal:0.5 |
weird:0.25"
"A painting of an apple in a fruit bowl"
StyleCLIP
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
StyleGAN3
Alias-Free Generative Adversarial Networks (StyleGAN3)
StyleGAN3 + CLIP
StyleGAN3 + CLIP by mishin_learning
Thank you for your attention!
Yurii Pashchenko AI&BigData Online Day 2021
Yurii Pashchenko
Sr ML Engineer at Depositphotos
yurii_pas
george.pashchenko@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
 
Computer Vision.pptx
Computer Vision.pptxComputer Vision.pptx
Computer Vision.pptx
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
 
Deep residual learning for image recognition
Deep residual learning for image recognitionDeep residual learning for image recognition
Deep residual learning for image recognition
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 

Ähnlich wie Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI

"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
Edge AI and Vision Alliance
 
5 Lessons I’ve Learned Tackling Product Matching for E-commerce
5 Lessons I’ve Learned Tackling Product Matching for E-commerce 5 Lessons I’ve Learned Tackling Product Matching for E-commerce
5 Lessons I’ve Learned Tackling Product Matching for E-commerce
Govind Chandrasekhar
 

Ähnlich wie Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI (20)

Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...
 
What multimodal foundation models cannot perceive
What multimodal foundation models cannot perceiveWhat multimodal foundation models cannot perceive
What multimodal foundation models cannot perceive
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Deep learning applications in e-commerce search: Dynamic talks Chicago 3/14/2019
Deep learning applications in e-commerce search: Dynamic talks Chicago 3/14/2019Deep learning applications in e-commerce search: Dynamic talks Chicago 3/14/2019
Deep learning applications in e-commerce search: Dynamic talks Chicago 3/14/2019
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary MeetupCvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
Deep Representation: Building a Semantic Image Search Engine
Deep Representation: Building a Semantic Image Search EngineDeep Representation: Building a Semantic Image Search Engine
Deep Representation: Building a Semantic Image Search Engine
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
A guide to Face Detection in Python.pdf
A guide to Face Detection in Python.pdfA guide to Face Detection in Python.pdf
A guide to Face Detection in Python.pdf
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
 
09 grasp
09 grasp09 grasp
09 grasp
 
5 Lessons I’ve Learned Tackling Product Matching for E-commerce
5 Lessons I’ve Learned Tackling Product Matching for E-commerce 5 Lessons I’ve Learned Tackling Product Matching for E-commerce
5 Lessons I’ve Learned Tackling Product Matching for E-commerce
 

Mehr von Lviv Startup Club

Mehr von Lviv Startup Club (20)

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Kürzlich hochgeladen (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI

  • 1. Zero-shot learning capabilities of CLIP model from Yurii Pashchenko AI&BigData Online Day 2021 Yurii Pashchenko Sr ML Engineer at Depositphotos
  • 2. About me ❏ Yurii Pashchenko ❏ Sr Machine Learning Engineer at Depositphotos ❏ Over 8 years of research and commercial experience in applying Deep Learning models ❏ Object Detection Specialist ❏ Knowledge Sharing Master at Transformer* at least I want to become ��
  • 3. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 4. What is Zero-Shot Learning Understanding Zero-Shot Learning — Making ML More Human
  • 5. Motivation of CLIP from OpenAI? ● Costly datasets ● Narrow ● Poor real-world performance CLIP: Connecting Text and Images
  • 6. CLIP: Contrastive Language-Image Pre-training Learning Transferable Visual Models From Natural Language Supervision ● 400 million (image, text) pairs collected from Internet. ● Trained modifications of ResNet-50 and ViT-B ● Batch size 32 768 for 32 epochs ● The largest ResNet model, RN50x64, took 18 days to train on 592 V100 GPUs while the largest Vision Transformer took 12 days on 256 V100 GPUs
  • 7. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 8. CLIP for Zero-Shot Classification Learning Transferable Visual Models From Natural Language Supervision Ensembling around 80 prompts improve ImageNet accuracy by almost 5%
  • 9. CLIP Zero-Shot visual results CLIP: Connecting Text and Images
  • 10. CLIP Zero-Shot generalization Learning Transferable Visual Models From Natural Language Supervision
  • 11. CLIP Zero-Shot vs Few-Shot Learning Transferable Visual Models From Natural Language Supervision
  • 12. CLIP on FairFace FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation CLIP has a top-1 accuracy of 59.2% for “in the wild” celebrity image classification when choosing from 100 candidates and a top-1 accuracy of 43.3% when choosing from 1000 possible choices
  • 13. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 14. CLIP for Image Ranking DALL·E: Creating Images from Text “an armchair in the shape of an avocado” “a living room with two white armchairs and a painting of the collosseum. the painting is mounted above a modern fireplace”
  • 15. CLIP for Image Search Text-to-Image Unsplash Image Search
  • 16. CLIP for Image Search Image-to-Image Unsplash Image Search
  • 17. CLIP for Image Search Text+Text-to-Image Unsplash Image Search
  • 18. CLIP for Image Search Image+Text-to-Image Unsplash Image Search + “cars”
  • 19. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 20. CLIP limitations Learning Transferable Visual Models From Natural Language Supervision ● poor generalization to images not covered in its pre-training dataset (MNIST)
  • 21. Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers an elephant a zebra a lake Text: examples from this collab CLIP limitations
  • 22. CLIP limitations Learning Transferable Visual Models From Natural Language Supervision ● poor generalization to images not covered in its pre-training dataset (MNIST) ● counting the number of objects in an image ● predicting how close the nearest object is in a photo ● CLIP’s zero-shot classifiers can be sensitive to wording or phrasing and sometimes require trial and error “prompt engineering” to perform well.
  • 23. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 24. You can’t just make an Object Detector from a Classifier … without fine-tuning
  • 25. Assembling Object Detector with CLIP Rich feature hierarchies for accurate object detection and semantic segmentation CLIP Text Encoder person
  • 26. Region proposals alternatives Salient Object Detection Techniques in Computer Vision—A Survey Salient object detection (SOD) is an important computer vision task aimed at precise detection and segmentation of visually distinctive image regions from the perspective of the human visual system
  • 27. Region proposals alternatives Open-World Entity Segmentation Entity Segmentation is a segmentation task with the aim to segment everything in an image into semantically-meaningful regions without considering any category labels.
  • 28. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 29. What is knowledge distillation? Knowledge Distillation : Simplified Knowledge distillation refers to the idea of model compression by teaching a smaller network, step by step, exactly what to do using a bigger already trained network.
  • 30. Mask R-CNN - Why? - Class-agnostic bbox regression and mask prediction Mask R-CNN
  • 31. Vision and Language knowledge Distillation Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
  • 32. VILD results Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
  • 33. VILD generalization ability Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
  • 34. VILD visualizations Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
  • 35. VILD visualizations Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
  • 36. Zero-Shot object tracking Introducing Zero Shot Object Tracking
  • 37. Zero-shot learning capabilities of CLIP model from OpenAI ❏ Short intro to Zero-Shot Learning and CLIP from OpenAI ❏ Zero-Shot Classification based on CLIP ❏ CLIP for image ranking & search ❏ Limitations of CLIP model ❏ Object Detection/Segmentation ❏ Knowledge distillation ❏ GANs + CLIP
  • 38. VQGAN + CLIP The Illustrated VQGAN
  • 39. VQGAN + CLIP https://github.com/nerdyrodent/VQGAN-CLIP "A painting of an apple in a fruit bowl | psychedelic | surreal:0.5 | weird:0.25" "A painting of an apple in a fruit bowl"
  • 42. StyleGAN3 + CLIP StyleGAN3 + CLIP by mishin_learning
  • 43. Thank you for your attention! Yurii Pashchenko AI&BigData Online Day 2021 Yurii Pashchenko Sr ML Engineer at Depositphotos yurii_pas george.pashchenko@gmail.com