SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
let's build a system to describe an
image
Debarko De (@debarko)
Engineering Manager @ Practo
Who am I?
● Debarko De
● twitter.com/@debarko
● ex @Hashcube
○ Facebook
○ Mobile platforms
● Engineering Manager @ Practo
○ eCommerce Division of Practo
○ Medicines & Lab Tests
● AI ML Application Developer (Researcher)
Agenda
● Problem Statement
● Basic building blocks for the network
● CNN
● Transfer Learning
● RNN
● LSTM
● How do we wire them together?
● Code
● Other places this can be implemented
● Interaction & Questions
Setting right expectations
● I’ll cover the basic theory needed to understand the network
● This is not a math session about NNs
● It’s more about implementation rather than theory
● Preferable I would like to take the questions at the end
● Slides and Code will be shared separately
● Time is of essence, so let's begin.
What are we trying to do?
Use cases
● Visual to Text systems for blind people
● Search Engines for searching medical records
● Auto Tagging medical imaging data
● Tagging video consultations
Parts to an Image Captioning System
● CNN
● RNN with LSTM unit
● Training Data
● Training
● Eval with BLEU Score
Convolutional Neural Networks
CNN vs Traditional Imaging Ways
References for CNN
● Image classification using CNN
https://www.slideshare.net/debarko/image-classification-using-cnn
● Deep Learning for Noobs (Part 1 & Part 2)
https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d
● Karpathy’s Article on CNN
http://cs231n.github.io/convolutional-networks/
Recurrent Neural Network
RNN
● As humans we understand context
● Every single time we don’t reset our understanding
● Thoughts have persistence
● Traditional NNs like CNNs don’t have persistence
● Usage: speech recognition, language modeling, translation
PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.
Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.
DUKE VINCENTIO:
Well, your wit is in the care of side and that.
Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.
Clown:
Come, sir, I will make did behold your worship.
VIOLA: I'll drink it.
● RNNs are supposed to remember
● Like in video understanding based on previous scenes
● Can RNNs do it?
Long term dependency problem
the grass is _____
the grass is green
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was ____ or ____?
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was ____ or ____?
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was long or short?
RNNs can’t connect the dots ...
● RNNs technically should be capable to store long dependencies
● In practice the data gets diluted
● This problem gave birth to LSTM Networks
Long term dependency problem
Long Short Term Memory networks
A Regular RNN
LSTM
Sigmoid and tanh neural layers
LSTM
Sigmoid and tanh neural layers
Final Model
Training Data
1. Flickr 8k Dataset (Link)
2. Flickr 30k Dataset (Link)
3. Microsoft COCO Dataset (Link)
8kDB: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artifical Intellegence Research,
Volume 47, pages 853-899 http://www.jair.org/papers/paper3994.html
Let’s get our hands dirty
Get Set Code...
Evaluation
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate
translation of text to one or more reference translations.
Human captions are the reference translations
Generated text via the LSTM network is the candidate translation
DEMO
https://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/
http://localhost:3000/ - Demo server running locally on my laptop
Other places to use this
● Fashion websites can ask users to take photos and then get matching dresses for
sale
● Blind people can use a mobile app which reads out the current scene, helping
them understand the environment better and also enjoy the beauty of sunset.
● Image search engines can be built on this
Current Production Ready models
● Google Show and Tell
● NeuralTalk2 (https://github.com/karpathy/neuraltalk2)
● https://www.leadergpu.com/
References
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/
● http://www.jair.org/media/3994/live-3994-7274-jair.pdf
● https://github.com/debarko/truview_models
● https://github.com/debarko/truview
● https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.ht
ml
● https://github.com/tensorflow/models/tree/master/research/im2txt
● One of the best GPU server farms (https://www.leadergpu.com/)
● https://en.wikipedia.org/wiki/BLEU
fin.
@debarko

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers leopauly
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanismSwatiNarkhede1
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer VisionDeep Kayal
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUSri Geetha
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxMrUnknown820784
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text SummarizationTho Phan
 

Was ist angesagt? (20)

Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanism
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Vgg
VggVgg
Vgg
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptx
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 

Ähnlich wie Image captioning with Keras and Tensorflow - Debarko De @ Practo

Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopTom Bocklisch
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackJustina Petraitytė
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deakin University
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningDr. Ananth Krishnamoorthy
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0Plain Concepts
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfOrtus Solutions, Corp
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Gendry Morales
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonNhi Nguyen
 
Writing effective content
Writing effective contentWriting effective content
Writing effective contentHarshal Patil
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxsaivinay93
 

Ähnlich wie Image captioning with Keras and Tensorflow - Debarko De @ Practo (20)

Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData Workshop
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stack
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2
 
Messaging
MessagingMessaging
Messaging
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
Messaging
MessagingMessaging
Messaging
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
 
Writing effective content
Writing effective contentWriting effective content
Writing effective content
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Image captioning with Keras and Tensorflow - Debarko De @ Practo

  • 1. let's build a system to describe an image Debarko De (@debarko) Engineering Manager @ Practo
  • 2. Who am I? ● Debarko De ● twitter.com/@debarko ● ex @Hashcube ○ Facebook ○ Mobile platforms ● Engineering Manager @ Practo ○ eCommerce Division of Practo ○ Medicines & Lab Tests ● AI ML Application Developer (Researcher)
  • 3. Agenda ● Problem Statement ● Basic building blocks for the network ● CNN ● Transfer Learning ● RNN ● LSTM ● How do we wire them together? ● Code ● Other places this can be implemented ● Interaction & Questions
  • 4. Setting right expectations ● I’ll cover the basic theory needed to understand the network ● This is not a math session about NNs ● It’s more about implementation rather than theory ● Preferable I would like to take the questions at the end ● Slides and Code will be shared separately ● Time is of essence, so let's begin.
  • 5. What are we trying to do?
  • 6.
  • 7.
  • 8. Use cases ● Visual to Text systems for blind people ● Search Engines for searching medical records ● Auto Tagging medical imaging data ● Tagging video consultations
  • 9. Parts to an Image Captioning System ● CNN ● RNN with LSTM unit ● Training Data ● Training ● Eval with BLEU Score
  • 11.
  • 12.
  • 13. CNN vs Traditional Imaging Ways
  • 14.
  • 15. References for CNN ● Image classification using CNN https://www.slideshare.net/debarko/image-classification-using-cnn ● Deep Learning for Noobs (Part 1 & Part 2) https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d ● Karpathy’s Article on CNN http://cs231n.github.io/convolutional-networks/
  • 16.
  • 18. RNN ● As humans we understand context ● Every single time we don’t reset our understanding ● Thoughts have persistence ● Traditional NNs like CNNs don’t have persistence ● Usage: speech recognition, language modeling, translation
  • 19.
  • 20.
  • 21.
  • 22. PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep. Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states. DUKE VINCENTIO: Well, your wit is in the care of side and that. Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I'll have the heart of the wars. Clown: Come, sir, I will make did behold your worship. VIOLA: I'll drink it.
  • 23. ● RNNs are supposed to remember ● Like in video understanding based on previous scenes ● Can RNNs do it? Long term dependency problem
  • 24. the grass is _____
  • 25. the grass is green
  • 26. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was ____ or ____?
  • 27. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was ____ or ____?
  • 28. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was long or short?
  • 29. RNNs can’t connect the dots ...
  • 30. ● RNNs technically should be capable to store long dependencies ● In practice the data gets diluted ● This problem gave birth to LSTM Networks Long term dependency problem
  • 31. Long Short Term Memory networks
  • 33. LSTM Sigmoid and tanh neural layers
  • 34. LSTM Sigmoid and tanh neural layers
  • 36. Training Data 1. Flickr 8k Dataset (Link) 2. Flickr 30k Dataset (Link) 3. Microsoft COCO Dataset (Link) 8kDB: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artifical Intellegence Research, Volume 47, pages 853-899 http://www.jair.org/papers/paper3994.html
  • 37.
  • 38. Let’s get our hands dirty
  • 39.
  • 41. Evaluation BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations. Human captions are the reference translations Generated text via the LSTM network is the candidate translation
  • 43. Other places to use this ● Fashion websites can ask users to take photos and then get matching dresses for sale ● Blind people can use a mobile app which reads out the current scene, helping them understand the environment better and also enjoy the beauty of sunset. ● Image search engines can be built on this
  • 44. Current Production Ready models ● Google Show and Tell ● NeuralTalk2 (https://github.com/karpathy/neuraltalk2) ● https://www.leadergpu.com/
  • 45. References ● http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ● http://www.jair.org/media/3994/live-3994-7274-jair.pdf ● https://github.com/debarko/truview_models ● https://github.com/debarko/truview ● https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.ht ml ● https://github.com/tensorflow/models/tree/master/research/im2txt ● One of the best GPU server farms (https://www.leadergpu.com/) ● https://en.wikipedia.org/wiki/BLEU
  • 46. fin.