1. The speaker will demonstrate object detection on Android using TensorFlow and the SSD model.
2. SSD is well-suited for mobile as it is faster than other models like Faster R-CNN while maintaining reasonable accuracy.
3. The example will involve gathering image data, labeling objects, training an SSD model in TensorFlow, and integrating it into an Android app for real-time clothes detection on mobile.
The document summarizes Assaf Mushinsky's presentation at CVPR 2017. Some key points:
- He discussed state-of-the-art research in object detection, segmentation, pose estimation, and network architectures from papers presented at CVPR 2017.
- Papers presented efficient object detection methods that improved speed and accuracy trade-offs like YOLO9000 and Feature Pyramid Networks. Mask R-CNN was discussed for instance segmentation and pose estimation.
- New network architectures like Densely Connected Networks, Xception, and ResNeXt were covered that improved accuracy and efficiency over ResNet and Inception.
- The presentation highlighted recent advances in computer vision from the CVPR conference but did not cover older
This is an intensive meetup at Samsung Next IL covering most interesting papers that were presented in CVPR 2017 last month. It is a good opportunity to have an overview of recent advancements in the field of Deep Learning with applications to Computer-Vision.
The following topics are covered:
• Object detection
• Pose estimation
• Efficient networks
1) The document presents DAVE, a unified framework using two CNNs (FVPN and ALN) for fast vehicle detection and annotation of attributes like pose, color, and type.
2) The FVPN is a shallow fully convolutional network that efficiently generates vehicle proposals. The ALN is based on GoogLeNet and extended with additional layers for multi-task learning of vehicle attributes.
3) The two networks are jointly trained using a large vehicle dataset, with the FVPN providing proposals to the ALN for attribute annotation. Experiments show DAVE outperforms other methods on vehicle detection and annotation tasks.
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
The document discusses developing a neural network model for automatic image cropping. It proposes that properly cropped images can improve listing performance on online classifieds sites. The model uses a Deeply Supervised Salient Network (DSS) which improves on fully convolutional networks with deep supervision and short connections. Experiments were conducted on an online classifieds site by manually evaluating cropped images in different categories. The best performing category of engagement rings was selected for initial deployment. The system architecture includes components for image enhancement, cropping, hosting, and a frontend interface.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
The document summarizes Assaf Mushinsky's presentation at CVPR 2017. Some key points:
- He discussed state-of-the-art research in object detection, segmentation, pose estimation, and network architectures from papers presented at CVPR 2017.
- Papers presented efficient object detection methods that improved speed and accuracy trade-offs like YOLO9000 and Feature Pyramid Networks. Mask R-CNN was discussed for instance segmentation and pose estimation.
- New network architectures like Densely Connected Networks, Xception, and ResNeXt were covered that improved accuracy and efficiency over ResNet and Inception.
- The presentation highlighted recent advances in computer vision from the CVPR conference but did not cover older
This is an intensive meetup at Samsung Next IL covering most interesting papers that were presented in CVPR 2017 last month. It is a good opportunity to have an overview of recent advancements in the field of Deep Learning with applications to Computer-Vision.
The following topics are covered:
• Object detection
• Pose estimation
• Efficient networks
1) The document presents DAVE, a unified framework using two CNNs (FVPN and ALN) for fast vehicle detection and annotation of attributes like pose, color, and type.
2) The FVPN is a shallow fully convolutional network that efficiently generates vehicle proposals. The ALN is based on GoogLeNet and extended with additional layers for multi-task learning of vehicle attributes.
3) The two networks are jointly trained using a large vehicle dataset, with the FVPN providing proposals to the ALN for attribute annotation. Experiments show DAVE outperforms other methods on vehicle detection and annotation tasks.
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
The document discusses developing a neural network model for automatic image cropping. It proposes that properly cropped images can improve listing performance on online classifieds sites. The model uses a Deeply Supervised Salient Network (DSS) which improves on fully convolutional networks with deep supervision and short connections. Experiments were conducted on an online classifieds site by manually evaluating cropped images in different categories. The best performing category of engagement rings was selected for initial deployment. The system architecture includes components for image enhancement, cropping, hosting, and a frontend interface.
This document discusses deep learning techniques for object detection and recognition. It provides an overview of computer vision tasks like image classification and object detection. It then discusses how crowdsourcing large datasets from the internet and advances in machine learning, specifically deep convolutional neural networks (CNNs), have led to major breakthroughs in object detection. Several state-of-the-art CNN models for object detection are described, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO. The document also provides examples of applying these techniques to tasks like face detection and detecting manta rays from aerial videos.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
The document discusses implementing deep learning algorithms for object detection and scene perception in self-driving cars. It compares the YOLO and Faster R-CNN models, finding that Faster R-CNN has higher accuracy (mAP of 41.8) but lower speed (17.1 FPS), while YOLO has lower accuracy (mAP of 18.6) but higher speed (212.4 FPS). The authors conclude that achieving both high accuracy and high speed remains a goal for future work, which could explore using newer versions of YOLO or other models.
Content-based image retrieval (CBIR) uses computer vision techniques to search for and retrieve images from large databases based on visual similarities. CBIR systems typically extract features from images and measure similarities to return images matching a query image. Popular applications include Google Images, eBay, and Pinterest. Evaluation of CBIR systems focuses on precision and recall metrics, as precision alone is insufficient without also considering recall. Training siamese networks for CBIR requires loss functions that pull similar images closer together and push dissimilar images farther apart.
YouTube: https://youtu.be/XSoau_q0kz8
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on "KNN algorithm using R", will help you learn about the KNN algorithm in depth, you'll also see how KNN is used to solve real-world problems. Below are the topics covered in this module:
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Lluis Carreras
As Andrew Ng says, AI is the new electricity and it will transform many industries, therefore dating is going to be transform by the use of AI.
With the experience of having learned AI during the university several years ago, and having updated it since 2016, plus 10 years working experience in the dating industry, this presentation shows the evolution of AI during these last years, and shows some AI examples already used in Dating services.
Then shows where AI can be applied to dating services, what is needed, which models can be used, shows the building process, and how can be done.
Automatism System Using Faster R-CNN and SVMIRJET Journal
The document describes a proposed system to automatically manage vacant parking spaces using computer vision techniques. The system would use existing surveillance cameras installed in parking lots. It detects vehicles in images using a Faster R-CNN object detection model. This model uses a Region Proposal Network to quickly detect objects. An SVM classifier is then used to classify detected objects as free or occupied parking spaces. The goal is to assist drivers in finding available spaces more efficiently.
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
This presentation explores the interaction between virtual reality simulation and Deep Learning which may develop computer vision that rivals human vision. The specific problem considered is detection and localization of a stop object, the stop sign, based on an image. A video game, Grand Theft Auto 5, is used to collect over half a million images and corresponding ground truth labels with and without stop signs in various lighting and weather conditions. A deep convolutional neural network trained on this data and fine tuned on real world data achieves accuracy in stop sign detection of over 95% within 20 meters of the stop sign and has a false positive rate of 4% on test data from the real world. Additionally, the physical constraints on this problem are analysed, and a framework for the use of simulators is developed.
The document provides an introduction to computer vision concepts including neural network structures, activation functions, convolution operators, pooling layers, and batch normalization. It then discusses image classification, including popular datasets, classification networks from LeNet to DLA, and experiments on car brand classification. Finally, it covers object detection, comparing region-based methods like R-CNN, Fast R-CNN, Faster R-CNN, and R-FCN to region-free methods like YOLO.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
Avihu Efrat's Viola and Jones face detection slideswolf
The document summarizes the Viola-Jones object detection framework. It uses a cascade of classifiers with increasingly more complex features trained with AdaBoost to rapidly detect objects. Integral images allow for very fast feature evaluations. The framework was applied to face detection, achieving very fast average detection speeds of 270 microseconds per sub-window while maintaining low false positive rates.
Supervised learning involves using a training dataset to learn a target function that can be used to predict class labels or attribute values. The document discusses supervised learning and classification, including types of supervised learning problems like classification and regression. It provides examples of classification algorithms like K-nearest neighbors, decision trees, naive Bayes, and support vector machines. It also gives examples of how to implement classification algorithms using scikit-learn and discusses evaluating classification model performance based on accuracy.
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
Vowpal Wabbit is a machine learning library that provides fast, scalable, and online learning algorithms. It can handle large datasets with millions of features efficiently using hashing and sparse representations. Unlike other libraries, Vowpal Wabbit is designed for online and active learning, allowing the model to be updated continuously as new data is processed. It performs linear learning rapidly using stochastic gradient descent and has been shown to scale to billions of examples and trillions of features.
This document discusses using fully convolutional neural networks for defect inspection. It begins with an agenda that outlines image segmentation using FCNs and defect inspection. It then provides details on data preparation including labeling guidelines, data augmentation, and model setup using techniques like deconvolution layers and the U-Net architecture. Metrics for evaluating the model like Dice score and IoU are also covered. The document concludes with best practices for successful deep learning projects focusing on aspects like having a large reusable dataset, feasibility of the problem, potential payoff, and fault tolerance.
VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.
The document describes the VIBE approach for 3D human pose and shape estimation from video. VIBE uses an adversarial learning framework with a temporal encoder network that incorporates self-attention. It regresses pose and shape parameters from video frames. A motion discriminator is trained to distinguish real from generated poses, enforcing kinematically plausible poses without 3D ground truth labels. Results show VIBE generates accurate 3D poses and shapes from in-the-wild videos.
Object Detection for Autonomous Cars using AI/MLIRJET Journal
The document discusses using machine learning and computer vision techniques for object detection in autonomous vehicles. Specifically, it proposes using the Single Shot Detector (SSD) algorithm to identify and classify objects around a self-driving car from camera images. The SSD model was trained on a dataset to detect common objects like cars, people, buses etc. and estimate bounding boxes around detected objects. The methodology uses OpenCV and TensorFlow to implement SSD on images from a webcam in real-time. While bounding boxes were sometimes inconsistent in dense traffic, detection was more accurate for objects closer to the camera or in less crowded scenarios. The goal is to demonstrate how computer vision allows autonomous vehicles to perceive their surroundings.
The document summarizes Md Abul Hayat's research on image segmentation using deep neural networks. It discusses using various CNN architectures like autoencoders, fully convolutional networks, U-Net, ResNet, and DenseNet for segmenting OCT images of skin. It presents experimental results comparing the DCU-Net and U-Net models on fingertip and palm image datasets, finding that DCU-Net achieved better performance for segmentation and potential for transfer learning across datasets. Future work could include training on larger datasets, accounting for temporal variations, generalizing to other body parts, using 3D models, and collecting more annotations.
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
1. The document presents the seminal work of Viola and Jones on rapid object detection using boosted cascades of simple features.
2. It introduces integral images for fast feature evaluation and uses AdaBoost for feature selection and classifier training in a cascade structure.
3. The cascade approach combines classifiers such that earlier ones rapidly reject negatives while later ones focus on positives, achieving real-time detection rates.
Jaroslaw Szymczak presented an approach for automatic image moderation in classified listings. The approach uses machine learning techniques including convolutional neural networks (CNNs) to extract image features and eXtreme Gradient Boosting (XGBoost) to combine image and listing features. To address class imbalance between acceptable and unacceptable images, the training data was undersampled from a 99:1 ratio to a 9:1 ratio. Key evaluation metrics for the imbalanced data include ROC AUC, PR AUC, and precision or recall at fixed thresholds of the other. The trained models are deployed into a live service using Flask, containerized with Docker, and monitored for performance using Grafana.
The document discusses automatic image moderation in classified ads. It outlines an approach using machine learning to classify images as appropriate or inappropriate. Key aspects include using convolutional neural networks to extract image features, combining image and listing metadata, dealing with class imbalance, developing batch processing pipelines, and monitoring a live classification system. The overall goal is to automatically moderate millions of images uploaded daily to classified ad platforms.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Content-based image retrieval (CBIR) uses computer vision techniques to search for and retrieve images from large databases based on visual similarities. CBIR systems typically extract features from images and measure similarities to return images matching a query image. Popular applications include Google Images, eBay, and Pinterest. Evaluation of CBIR systems focuses on precision and recall metrics, as precision alone is insufficient without also considering recall. Training siamese networks for CBIR requires loss functions that pull similar images closer together and push dissimilar images farther apart.
YouTube: https://youtu.be/XSoau_q0kz8
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on "KNN algorithm using R", will help you learn about the KNN algorithm in depth, you'll also see how KNN is used to solve real-world problems. Below are the topics covered in this module:
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Introduction to Machine Learning
What is KNN Algorithm?
KNN Use Case
KNN Algorithm step by step
Hands - On
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Lluis Carreras
As Andrew Ng says, AI is the new electricity and it will transform many industries, therefore dating is going to be transform by the use of AI.
With the experience of having learned AI during the university several years ago, and having updated it since 2016, plus 10 years working experience in the dating industry, this presentation shows the evolution of AI during these last years, and shows some AI examples already used in Dating services.
Then shows where AI can be applied to dating services, what is needed, which models can be used, shows the building process, and how can be done.
Automatism System Using Faster R-CNN and SVMIRJET Journal
The document describes a proposed system to automatically manage vacant parking spaces using computer vision techniques. The system would use existing surveillance cameras installed in parking lots. It detects vehicles in images using a Faster R-CNN object detection model. This model uses a Region Proposal Network to quickly detect objects. An SVM classifier is then used to classify detected objects as free or occupied parking spaces. The goal is to assist drivers in finding available spaces more efficiently.
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
This presentation explores the interaction between virtual reality simulation and Deep Learning which may develop computer vision that rivals human vision. The specific problem considered is detection and localization of a stop object, the stop sign, based on an image. A video game, Grand Theft Auto 5, is used to collect over half a million images and corresponding ground truth labels with and without stop signs in various lighting and weather conditions. A deep convolutional neural network trained on this data and fine tuned on real world data achieves accuracy in stop sign detection of over 95% within 20 meters of the stop sign and has a false positive rate of 4% on test data from the real world. Additionally, the physical constraints on this problem are analysed, and a framework for the use of simulators is developed.
The document provides an introduction to computer vision concepts including neural network structures, activation functions, convolution operators, pooling layers, and batch normalization. It then discusses image classification, including popular datasets, classification networks from LeNet to DLA, and experiments on car brand classification. Finally, it covers object detection, comparing region-based methods like R-CNN, Fast R-CNN, Faster R-CNN, and R-FCN to region-free methods like YOLO.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
Avihu Efrat's Viola and Jones face detection slideswolf
The document summarizes the Viola-Jones object detection framework. It uses a cascade of classifiers with increasingly more complex features trained with AdaBoost to rapidly detect objects. Integral images allow for very fast feature evaluations. The framework was applied to face detection, achieving very fast average detection speeds of 270 microseconds per sub-window while maintaining low false positive rates.
Supervised learning involves using a training dataset to learn a target function that can be used to predict class labels or attribute values. The document discusses supervised learning and classification, including types of supervised learning problems like classification and regression. It provides examples of classification algorithms like K-nearest neighbors, decision trees, naive Bayes, and support vector machines. It also gives examples of how to implement classification algorithms using scikit-learn and discusses evaluating classification model performance based on accuracy.
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
Vowpal Wabbit is a machine learning library that provides fast, scalable, and online learning algorithms. It can handle large datasets with millions of features efficiently using hashing and sparse representations. Unlike other libraries, Vowpal Wabbit is designed for online and active learning, allowing the model to be updated continuously as new data is processed. It performs linear learning rapidly using stochastic gradient descent and has been shown to scale to billions of examples and trillions of features.
This document discusses using fully convolutional neural networks for defect inspection. It begins with an agenda that outlines image segmentation using FCNs and defect inspection. It then provides details on data preparation including labeling guidelines, data augmentation, and model setup using techniques like deconvolution layers and the U-Net architecture. Metrics for evaluating the model like Dice score and IoU are also covered. The document concludes with best practices for successful deep learning projects focusing on aspects like having a large reusable dataset, feasibility of the problem, potential payoff, and fault tolerance.
VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.
The document describes the VIBE approach for 3D human pose and shape estimation from video. VIBE uses an adversarial learning framework with a temporal encoder network that incorporates self-attention. It regresses pose and shape parameters from video frames. A motion discriminator is trained to distinguish real from generated poses, enforcing kinematically plausible poses without 3D ground truth labels. Results show VIBE generates accurate 3D poses and shapes from in-the-wild videos.
Object Detection for Autonomous Cars using AI/MLIRJET Journal
The document discusses using machine learning and computer vision techniques for object detection in autonomous vehicles. Specifically, it proposes using the Single Shot Detector (SSD) algorithm to identify and classify objects around a self-driving car from camera images. The SSD model was trained on a dataset to detect common objects like cars, people, buses etc. and estimate bounding boxes around detected objects. The methodology uses OpenCV and TensorFlow to implement SSD on images from a webcam in real-time. While bounding boxes were sometimes inconsistent in dense traffic, detection was more accurate for objects closer to the camera or in less crowded scenarios. The goal is to demonstrate how computer vision allows autonomous vehicles to perceive their surroundings.
The document summarizes Md Abul Hayat's research on image segmentation using deep neural networks. It discusses using various CNN architectures like autoencoders, fully convolutional networks, U-Net, ResNet, and DenseNet for segmenting OCT images of skin. It presents experimental results comparing the DCU-Net and U-Net models on fingertip and palm image datasets, finding that DCU-Net achieved better performance for segmentation and potential for transfer learning across datasets. Future work could include training on larger datasets, accounting for temporal variations, generalizing to other body parts, using 3D models, and collecting more annotations.
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
1. The document presents the seminal work of Viola and Jones on rapid object detection using boosted cascades of simple features.
2. It introduces integral images for fast feature evaluation and uses AdaBoost for feature selection and classifier training in a cascade structure.
3. The cascade approach combines classifiers such that earlier ones rapidly reject negatives while later ones focus on positives, achieving real-time detection rates.
Jaroslaw Szymczak presented an approach for automatic image moderation in classified listings. The approach uses machine learning techniques including convolutional neural networks (CNNs) to extract image features and eXtreme Gradient Boosting (XGBoost) to combine image and listing features. To address class imbalance between acceptable and unacceptable images, the training data was undersampled from a 99:1 ratio to a 9:1 ratio. Key evaluation metrics for the imbalanced data include ROC AUC, PR AUC, and precision or recall at fixed thresholds of the other. The trained models are deployed into a live service using Flask, containerized with Docker, and monitored for performance using Grafana.
The document discusses automatic image moderation in classified ads. It outlines an approach using machine learning to classify images as appropriate or inappropriate. Key aspects include using convolutional neural networks to extract image features, combining image and listing metadata, dealing with class imbalance, developing batch processing pipelines, and monitoring a live classification system. The overall goal is to automatically moderate millions of images uploaded daily to classified ad platforms.
Ähnlich wie DroidCon Cluj 2018 - Hands on machine learning on android (20)
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
WeTestAthens: Postman's AI & Automation Techniques
DroidCon Cluj 2018 - Hands on machine learning on android
1.
2.
3. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine Learning
Speaker:
ANCA CIURTE - AI Team Lead at Softvision-
4. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Outline
● Why machine learning on Android?
● Mostly:
○ Some insights about Object Detection algorithms
○ Practical example in Tensorflow
○ Data gathering and labeling
○ Model training
● Hopefully:
○ It will inspire you to deeg deeper
○ It won’t confuse you too much :)
5. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Why machine learning on Android?
6. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
7. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
E.g.: Street view - face
blurring
E.g.: Self driving cars - pedestrian
detection
8. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
9. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
● Why on Android?
○ We are living in the era when mobile took over
○ Running on mobile makes it possible to
deliver interactive and real time applications
○ Latest released phones have great computing
power
10. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Some insights about Object Detection
11. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
Forward propagation (Given wl , compute predictions )
12. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Intuition about the convolution
Convolution Kernel
(weights)
Input image
* =
Another way to
understand the
convolution operation:
or: Convolution layer
or: Feature Map
or: Network’s parameters
13. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
● Testing phase
○ Use the trained model to classify new instances
○ Detection output: predicted class
Forward propagation (Given wl , compute predictions )
Loss function:
Backward propagation (compute wl+1 by minimizing the loss)
Repeat until
convergence
=> w*
14. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
15. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
16. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
Classified as pedestrian:All fragments:
...
17. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
Relation between classification and object detection
18. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!!
Relation between classification and object detection
19. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposals: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
20. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
R-CNN
21. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
Limitations:
● Depend on external algorithm hypothesis
● Need to rescale object proposals to fixed resolution
● Redundant computation - all features are
independently computed even for overlapped
proposal regions
22. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Girshick, “Fast R-CNN”, ICCV 2015
23. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Advantages:
● Higher detection quality (mAP) than R-CNN
● Training is single-stage
● Training can update all network layers at once
● No disk storage is required for feature caching
Girshick, “Fast R-CNN”, ICCV 2015
24. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Faster R-CNN
Faster R-CNN = Fast R-CNN + RPN (Region Proposal
Network)
● RPN
○ removes dependency from external hypothesis
ROI generation method
○ is a convolutional network trained end-to-end
○ generates a list of high-quality region proposal
(bbox coordinates + objectness scores)
● Then RPN + Fast R-CNN are merged into a single
network by sharing their convolutional features
○ predicts the class of the objects + a refined bbox
position
○ shared convolutional features enables nearly cost-
free region proposals
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
25. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
26. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
● ROIs proposal
○ output space of region proposals contains a fixed set of default boxes
over different aspect ratios and scales per feature map location
○ for each default bounding box, predict
○ the shape offsets Δ(cx, cy, w, h) and
○ the confidence for all object categories (c1, …, cp)
● Non-Maxima suppression
4x4 feature map
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
8x8 feature map
27. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Compare modern convolutional object detectors
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
28. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
29. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Takeaways:
● Faster R-CNN is slower but more accurate
● SSD is much faster but not as accurate (therefore is a good choice for mobile apps)
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
30. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
31. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
Problem to solve:
- a mobile app for real time clothes detection
- class categories: Top, Pants, Shorts, Skirt and Dress
Frameworks:
● Tensorflow Object Detection API
- made by GOOGLE
- an open source framework built on top of TensorFlow that
makes it easy to construct, train and deploy object detection
models
- input: images + labels
- output: inference graph (.pb format)
● LabelImg
- an open source graphical image annotation tool
- annotations are saved as XML files in PASCAL VOC format,
the format used by ImageNet dataset
32. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time: step by step
● Create dataset and split it into: train (70%) and test (30%) folders
● Label images with LabelImg tool (output: .xml files for each image in dataset)
● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv)
● Convert to TFRecord format
○ set paths (from ../models/research):
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
○ edit generate_tfrecord.py file and change the label map + path to the train/test folder:
○ finally execute the generate_tfrecord.py script in Terminal:
python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record
python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record
○ output: train.record, test.record
● Training
○ create a label map: label_map.pbtxt
○ optional, but recommended :), choose a pretrained model from here
○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
○ run training script (from ../models/research/object_detection):
python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config
● Export inference graph:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config
--trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph
output: the model in .pb format
33. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
e-mail: anca.ciurte@softvision.ro
Q&A
34. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Integrating with Android
Speaker:
MIHALY NAGY - Android Community Influencer at Softvision
35. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
36. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
● Model File
● [Labels File]
● tensorflow-android dependency
● Boilerplate
● Integrate TF to process each frame
37. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
38. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Bitmap
Recognition
each Frame
39. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Follow Along:
http://goo.gl/SYHSb7
https://github.com/code-twister/tf_example
40. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
41. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Thank You!
Hinweis der Redaktion
Running on mobile makes it possible to deliver interactive and real time applications in a way that’s not possible when depending on the internet connection
multile scales and aspect ratios are handles by search windows of different size and aspect, or by image scaling
From R-CNN to Fast R-CNN:
region pooling on “conv5” feature map for deature extraction
softmax classifier instead of SVM classifier
Multitask training:
the last fc layer branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map.
Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers.
fc layers finally branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
From R-CNN to Fast R-CNN:
region pooling on “conv5” feature map for deature extraction
softmax classifier instead of SVM classifier
Multitask training:
the last fc layer branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map.
Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers.
fc layers finally branch into two sibling output layers:
one that produces softmax probability estimates over K object classes
another layer that outputs the bounding box coordinates for each object.
A Region Proposal Network (RPN) takes an image
(of any size) as input and outputs a set of rectangular
object proposals, each with an objectness score.
SSD approach:
produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes
followed by a non-maximum suppression step to produce the final detections.
Network generates scores for each default box
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
There are several algorithms of Object detection
The question is: how well they compete to each other?
We define several meta parameters that influence detectors performance
Critical points on the curve that can be identified:
mAP = mean average precision
[Huang et al.] measured the influence of these metaparams on accuracy and speed
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Recognition refers to the objects detected not the process