SlideShare a Scribd company logo
1 of 34
Computer Vision
Landscape :
Present and
Future
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
Data Day Texas, 2023
Outline
• Images
• Enhanced Transcription
o Data Story
o Computer Vision model
o Metrics
o Deployment
• Computer Vision Landscape
• Image Embeddings
Images
Disclaimer: Images are replica’s representing real scenarios
Enhanced Transcription
Computer
Vision Model Transcription
Service
{”text”:”Resonant ocean thicknesses at different forcing frequencies. (a) Location of Europa's
first three largest resonant rotational-gravity modes as a function of forcing frequency and
ocean thickness, for both zonal (m = 0) and sectoral (m = 2) degree-2 modes…..”}
Reference paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020GL088317
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Version 3
Version 2
Version 1
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV models
cannot read, unless objects
are well defined and distinct
detection has a lot of errors
Version 3
Version 2
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Version 3
Version 2
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Version 2
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Redefine problem ---
Downstream applications
need the text that was
getting cropped out.
• Header Region
• Side Region
• Footer Region
• Question Region
• UI Elements
Version 3
Data Story
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Version 1 Version 3
Version 2
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Redefine problem ---
Downstream applications
need the text that was
getting cropped out.
• Header Region
• Side Region
• Footer Region
• Question Region
•
• UI Elements
• Text
• Equations
• Diagrams &
Charts
• Tables
Enhanced Transcription: Version 2
We are extracting Bounding Boxes.
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Tables
Text
Enhanced Transcription: Version 2
Equations
UI Elements
Diagrams and Charts
Building Object Detection Model: Training Pipeline
What is object Detection
Metrics: Intersection over Union
Predictions: Bounding Boxes (BB), classification labels. IOU is computed for each bounding box
Metrics: mAP@iou=0.5
Metrics are computed for a given IOU threshold.
For a prediction, we may get different binary TRUE or
FALSE positives, by changing the IoU threshold.
Average precision is computed for each class for a threshold of 0.5. mAP is the mean across all classes.
mAP@iou=0.5 >=0.8
Collecting Training Data: LabelBox
Retrieve archival images .
Create annotation project.
Write annotation guide. Make sure 5-
10% of the data is reviewed for quality
checks.
Look for inter-annotator agreement
for a small dataset
Collect labelled data.
Do some spot checks for annotation
quality
Object Detection Models
Region-based Convolutional Neural Networks (R-
CNN)
Cons: Very slow --- propagating thousand’s of RP’s through CNN & classifier takes a very long time
Vanishing/Exploding Gradients
Operation --- multiplying n small / large numbers to compute gradients of the “front” layers in
an n-layer network
When the network is deep, multiplying n small numbers will become zero (vanished).
When the network is deep, multiplying n large numbers will become too large (exploded).
Resnet-2015
Right: Regular CNN, Left: fit some residual , instead of the desired function
H(X) directly. A skip / shortcut connection is added to the input x along with
the output after few weight layers
Layers can be stacked to be 150 layers deep
Plain Network vs RESNET
YOLO (You Only Look Once)
Unified Detection ---
• Uses features from the entire image for prediction
• Predicts Bounding boxes across all classes simultaneously.
• Bounding boxes and classes are predicted in one shot, i.e by
the same network.
Divide input into grids class probability map Final detections
Yolo v5 network
Why Yolo?
o Faster Speed: YOLO algorithms works comparatively faster as compared to other
algorithm. Smaller model is able to process 155 frames per second.
o Accurary: State of art performance on several Object Detection datasets including
COCO.
o Open source code is available in multiple deep learning frameworks.
o Code is well developed and easy to use.
Limitations: small objects that are grouped together do not have good recall
Yolo v5 Pytorch codebase
https://github.com/ultralytics/yolov5
Lets look into the repo
python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt
Model size
Batch size
python detect.py --weights yolov5s.pt --source image.jpg
Deployment
Load Pytorch
model & predict
Bounding Boxes
Crop image with
Bounding Box
output
Send cropped image
to transcription
service
API output:
{Transcribed text,
Bounding box }
Version 2
Version 3
Measuring effectiveness of the Enhanced Transcription
Annotation Task: Labelbox
Which Transcription is better?
Improves Coverage
If the entire image was send to the
transcription service more than 5% of the
images returned “no content found”.
Cropping the image using object detection
removes low quality surrounding elements,
this facilitates recovery of transcription for
2.7% of images
Computer Vision
Landscape
Diagram Embeddings
Pulley diagram
Newton’s second
law
Friction
acceleration
Moment of Inertia
Extract diagram embeddings from pre-trained modes such as Resnet.
Use case
• Similarity based applications --- recommendation systems.
• Converting general predictive model into multimodal models with text , image and structured data features.
• Categorizing diagrams and creating a diagram ontology to create rich metadata.
Takeaways
o Computer Vision models can see but they cannot read.
o Doing a deepdive on metrics ahead of building the model is a good practice.
o YOLO performs well out of the box. Its open source and readily available with
very low latency.
o Building service combining outputs from external vendors requires careful
load testing.
o Having a vision beyond immediate deliverables creates avenues for overall
enrichment of ML products.
Thank You
@sangha_deb
sdeb@chegg.com
References
• Computer Vision Models : https://medium.com/augmented-startups/top-6-object-detection-algorithms-b8e5c41b952f.
https://www.v7labs.com/blog/yolo-object-detection#h1
• https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2
• R-FCN : https://arxiv.org/pdf/1605.06409.pdf
• YOLOV5 - https://arxiv.org/pdf/2108.11539.pdf

More Related Content

Similar to Computer Vision Landscape : Present and Future

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 PresentationShalom Cohen
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskASI Data Science
 

Similar to Computer Vision Landscape : Present and Future (20)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 Presentation
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (16)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Computer Vision Landscape : Present and Future

  • 1. Computer Vision Landscape : Present and Future Sanghamitra Deb Staff Data Scientist Chegg Inc Data Day Texas, 2023
  • 2. Outline • Images • Enhanced Transcription o Data Story o Computer Vision model o Metrics o Deployment • Computer Vision Landscape • Image Embeddings
  • 3. Images Disclaimer: Images are replica’s representing real scenarios
  • 4. Enhanced Transcription Computer Vision Model Transcription Service {”text”:”Resonant ocean thicknesses at different forcing frequencies. (a) Location of Europa's first three largest resonant rotational-gravity modes as a function of forcing frequency and ocean thickness, for both zonal (m = 0) and sectoral (m = 2) degree-2 modes…..”} Reference paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020GL088317
  • 5. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Version 3 Version 2
  • 7. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 3 Version 2
  • 8. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 3 Version 2 Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production
  • 9. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 2 Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production Redefine problem --- Downstream applications need the text that was getting cropped out. • Header Region • Side Region • Footer Region • Question Region • UI Elements Version 3
  • 10. Data Story Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production Version 1 Version 3 Version 2 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Redefine problem --- Downstream applications need the text that was getting cropped out. • Header Region • Side Region • Footer Region • Question Region • • UI Elements • Text • Equations • Diagrams & Charts • Tables
  • 11. Enhanced Transcription: Version 2 We are extracting Bounding Boxes. • Text • Equations • Diagrams and Charts • UI Elements • Tables Tables Text
  • 12. Enhanced Transcription: Version 2 Equations UI Elements Diagrams and Charts
  • 13. Building Object Detection Model: Training Pipeline
  • 14. What is object Detection
  • 15. Metrics: Intersection over Union Predictions: Bounding Boxes (BB), classification labels. IOU is computed for each bounding box
  • 16. Metrics: mAP@iou=0.5 Metrics are computed for a given IOU threshold. For a prediction, we may get different binary TRUE or FALSE positives, by changing the IoU threshold. Average precision is computed for each class for a threshold of 0.5. mAP is the mean across all classes. mAP@iou=0.5 >=0.8
  • 17. Collecting Training Data: LabelBox Retrieve archival images . Create annotation project. Write annotation guide. Make sure 5- 10% of the data is reviewed for quality checks. Look for inter-annotator agreement for a small dataset Collect labelled data. Do some spot checks for annotation quality
  • 19. Region-based Convolutional Neural Networks (R- CNN) Cons: Very slow --- propagating thousand’s of RP’s through CNN & classifier takes a very long time
  • 20. Vanishing/Exploding Gradients Operation --- multiplying n small / large numbers to compute gradients of the “front” layers in an n-layer network When the network is deep, multiplying n small numbers will become zero (vanished). When the network is deep, multiplying n large numbers will become too large (exploded).
  • 21. Resnet-2015 Right: Regular CNN, Left: fit some residual , instead of the desired function H(X) directly. A skip / shortcut connection is added to the input x along with the output after few weight layers Layers can be stacked to be 150 layers deep
  • 23. YOLO (You Only Look Once) Unified Detection --- • Uses features from the entire image for prediction • Predicts Bounding boxes across all classes simultaneously. • Bounding boxes and classes are predicted in one shot, i.e by the same network. Divide input into grids class probability map Final detections
  • 25. Why Yolo? o Faster Speed: YOLO algorithms works comparatively faster as compared to other algorithm. Smaller model is able to process 155 frames per second. o Accurary: State of art performance on several Object Detection datasets including COCO. o Open source code is available in multiple deep learning frameworks. o Code is well developed and easy to use. Limitations: small objects that are grouped together do not have good recall
  • 26. Yolo v5 Pytorch codebase https://github.com/ultralytics/yolov5 Lets look into the repo python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt Model size Batch size python detect.py --weights yolov5s.pt --source image.jpg
  • 27. Deployment Load Pytorch model & predict Bounding Boxes Crop image with Bounding Box output Send cropped image to transcription service API output: {Transcribed text, Bounding box } Version 2 Version 3
  • 28. Measuring effectiveness of the Enhanced Transcription Annotation Task: Labelbox Which Transcription is better?
  • 29. Improves Coverage If the entire image was send to the transcription service more than 5% of the images returned “no content found”. Cropping the image using object detection removes low quality surrounding elements, this facilitates recovery of transcription for 2.7% of images
  • 31. Diagram Embeddings Pulley diagram Newton’s second law Friction acceleration Moment of Inertia Extract diagram embeddings from pre-trained modes such as Resnet. Use case • Similarity based applications --- recommendation systems. • Converting general predictive model into multimodal models with text , image and structured data features. • Categorizing diagrams and creating a diagram ontology to create rich metadata.
  • 32. Takeaways o Computer Vision models can see but they cannot read. o Doing a deepdive on metrics ahead of building the model is a good practice. o YOLO performs well out of the box. Its open source and readily available with very low latency. o Building service combining outputs from external vendors requires careful load testing. o Having a vision beyond immediate deliverables creates avenues for overall enrichment of ML products.
  • 34. References • Computer Vision Models : https://medium.com/augmented-startups/top-6-object-detection-algorithms-b8e5c41b952f. https://www.v7labs.com/blog/yolo-object-detection#h1 • https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2 • R-FCN : https://arxiv.org/pdf/1605.06409.pdf • YOLOV5 - https://arxiv.org/pdf/2108.11539.pdf

Editor's Notes

  1. Classification and Localization --- Done using regression
  2. Selective search. --- extract several thousand region proposals. Each of these region proposals (RP) is labeled with a class and a ground-truth bounding box. A pre-trained CNN is used to extract features for the region proposals through forward propagation. These features are used to predict the class and bounding box of this region proposal using SVMs and linear regression. ROI pooling is followed by fully connected (FC) layers for classification and bounding box regression. The FC layers after ROI pooling do not share among different ROIs and take time. This makes R-CNN approaches slow, and the fully connected layers have a large number of parameters. Fast R-CNN performs the CNN forward propagation once on the entire image. Faster R-CNN reduces the total number of region proposals by using a region proposal network(RPN) instead of selective search to further improve the speed.
  3. Yolo reasons globally about the full image … YOLO models treat object detection as a regression problem. It divides the image into an S × S grid and for each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities. These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
  4. The details of the architecture are beyond the scope of this presentation. YOLO V5 HAS improvements in data augmentation compared to previous models. Resnet is one of the backbones used for the architecture for extracting features. Transformers are used in the prediction head. Predictions from multiple heads are ensembled using techniques such as non-max suppression to predict the bounding boxes. Additionally a resnet model is trained using image patches cropping from training data as classification training set.
  5. Test for I/ contract Send images that have no text and check for the output. Make sure there is logging