Weitere ähnliche Inhalte Ähnlich wie Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars (20) Kürzlich hochgeladen (20) Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars2. Copyright © 2017 LUXOFT 2
What This Talk is AboutWhat This Talk is About
Key Decisions
in a Deep Learning Computer Vision Project
1. Data 2. Compute Pipeline
3. Training and
Tuning
• Data acquisition, and
how much is enough
• Data processing
pipeline
• Compute infrastructure
• AI platform and model
• Data preparation
• Network training
• Fine tuning
3. Copyright © 2017 LUXOFT 3
Why We Are Giving This TalkWhy We Are Giving This Talk
Our AI Practice:
-Computer vision
-..and non-vision AI
PharmaPharma
AgricultureAgricultureRetailRetail
IndustrialIndustrial
AutomotiveAutomotive
5. Copyright © 2017 LUXOFT 5
Paradigm Shift in Software R&DParadigm Shift in Software R&D
Coding / testing
Integration, roll-out and
maintenance
Developers
IT Support
Knowledge
(Algorithm development)
System design
SME
System
Architects
Data selection
Learning Pipeline Design
Data preparation,
Training, Tuning
Integration, roll-out and
maintenance
SME
AI Architects
Data
specialists
Data and AI
Support
Access to DataFocus
of this
talk
New
Professions!
7. Copyright © 2017 LUXOFT 7
Paradigm Shift: Another Way to Look At ItParadigm Shift: Another Way to Look At It
Was Becoming
Subject Matter Experts Data
Algorithm Development No algorithm Development
Domain (Vertical) Knowledge AI (System) Knowledge
Decision Quality is Art Decision Quality is Predictable
The single most important investment you can make in AI project
is Data Strategy
The single most important investment you can make in AI project
is Data Strategy
8. Copyright © 2017 LUXOFT 8
1. Data coverage - needs to be representative - cover all cases (more than you
think)
2. Data balance (normalization) - about equal amount of data for each
class/case/scenario
3. Amount of data
Data Sourcing: It Starts with Data!Data Sourcing: It Starts with Data!
Acquire
Data
Prepare
Data
Six data guidelines that we consider important
…to achieve the best results (accuracy, false positives, false negatives).
4. Data formatting - make it work with existing/selected DNN architectures: Like
ROI selection, breaking big picture into smaller, video into frames, etc
5. Data synthesis and augmentation - “cat and mirror reflection of a cat are two
different cats”. Often easier to transform existing data than to obtain new data
6. Data annotation
9. Copyright © 2017 LUXOFT 9
Example: Pharma DataExample: Pharma Data
Crystal clear media
before dissolution
Just after the sample
is dropped, and
remains a single solid
piece
When the sample
started to swell, still a
single piece
When the sample
continues to swell, and
produces many small
particles: low-contrast
media
10. Copyright © 2017 LUXOFT 10
Pharma Improper Data SamplingPharma Improper Data Sampling –– ImbalancedImbalanced
SetSet
Accuracy paradox
clear media sample dropped
sample dissolving 1 sample dissolving 2
precipitate presented cloudy media
Labeled video sample
Problem: dataset imbalance
•Tricky to identify
•What to do: change performance metric for trained network
What to do:
•Use penalized models: adjust cost function for imbalance
•Decompose large classes into smaller
•Resample: over- and under-sample to balance
11. Copyright © 2017 LUXOFT 11
Pharma Improper Data Sampling (contd.)Pharma Improper Data Sampling (contd.)
Different optimization approaches to
find separate thresholds for each class
(1,2,3)
3 options of Uniform class distribution
(balanced datasets)
Proper data shuffling is important
Common approaches used by NVidia
DIGITS (conventional 2-stage shuffling)
Our custom data shuffling gave us up
to 1.3% of increase in accuracy in
comparison with conventional scheme
Top 1% accuracy
Balanced dataset 1 Balanced dataset 2 Balanced dataset 3
13. Copyright © 2017 LUXOFT 13
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
[Ian Goodfellow et all, ISBN 978-0262035613]
Classic ML
Deep Learning
14. Copyright © 2017 LUXOFT 14
100% Deep Learning: possible but not practical
•In this diagram, it seems that deep learning works
from raw data. In reality this is the most ideal case
• Needs infinite data + infinite compute
•Practical implementations are still in between
Classic ML and DL with a lot of upfront non-neural
efforts in data selection and preparation.
That is why we need to build data processing
pipeline
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
[Ian Goodfellow et all, ISBN 978-0262035613]
15. Copyright © 2017 LUXOFT 15
Feeding raw data straight to Deep Network is still a dream. Asymptotically possible, but
practically inefficient.
•Therefore we pipeline processing blocks, like this:
Visual Data Processing PipelineVisual Data Processing Pipeline
16. Copyright © 2017 LUXOFT 16
• Purpose:
• Region of interest (ROI) detection
• Camera calibration
• Detect shakes and shifts
• Solution
• Based on YOLO (you look only once,
http://pjreddie.com/darknet/yolo/)
• Processing in real-time
Example: Pharma Processing Pipeline (One of theExample: Pharma Processing Pipeline (One of the
AI Parts)AI Parts)
17. Copyright © 2017 LUXOFT 17
• Wobbling detection
• Represent video in temporal
space
• Search for sinusoid amplitude
• RPM detection
• Treat paddle as a signal
• Fourier transformation for
frequency detection
• Implemented for 20 FPS
• Calibration required
Example: Pharma Processing Pipeline (One of the Non-Example: Pharma Processing Pipeline (One of the Non-
AI Parts)AI Parts)
Spatial (left) and temporal (right) representation of video data
ROI (left) and signal (right) of paddle appearance
18. Copyright © 2017 LUXOFT 18
Platform selection (Caffe / Torch / TensorFlow / etc) - is like selecting a java app server, or web server:
many similarities and hard to chose, however this is what we use:
•Availability of models for your task
•Deployment compatibility: both enterprise and embedded
• Works with your cloud? Like Amazon AWS or MS Azure?
• Works on your device? Like ARM or NVidia GPUs?
•Distributed processing choices: Cloud / Edge
•Embedded optimization opportunities
•Maintenance considerations
•Support for training models, tools and scenarios <= important to think ahead
Platform Selection GuidelinesPlatform Selection Guidelines
19. Copyright © 2017 LUXOFT 19
We often hear: “use the latest, greatest model”.
However, we’d rather use “simple model for simple data”. Same as “good-enough”
concept.
Important selection criteria we encounter
•Production accuracy
•“Cascadability”
•Production run-time speed
•Training and fine-tuning scenarios: amount and
kind of required training data, efforts to train
Pharma example: Winner is AlexNet (yes!):
quick learner, fast to run, good accuracy on our data
Model Selection GuidelinesModel Selection Guidelines
Accuracy
Training Epoch
20. Copyright © 2017 LUXOFT 20
• Total number of different classes ~100,000
• We use a cascade of networks
• Level 1: Type-search network
• Level 2: Different Classification networks
for each type
• Conventional accuracy metrics
for type-search networks:
• mAP, Precision, Recall
• These metrics always use fixed IoU at =0.5
• IoU = Intersection over Union
• Classification accuracy strongly depends on IoU
• IoU = 0.5 classification probability =~ 0.68
• IoU > 0.7 classification probability > 0.90
Retail Example: “Cascadability” of a SearchRetail Example: “Cascadability” of a Search
NetworkNetwork
IoU=1 IoU=0.5
IoU
Classification
Probability
What is IoU:
22. Copyright © 2017 LUXOFT 22
AI System Development Is More IterativeAI System Development Is More Iterative
Traditional SW Engineering Deep Learning SW Engineering
23. Copyright © 2017 LUXOFT 23
Example: Pharma Iterative Training WorkflowExample: Pharma Iterative Training Workflow
database: ~5,000 images for
each class
Uniform distribution of
classes required
Training Epoch
24. Copyright © 2017 LUXOFT 24
Example: Pharma Iterative Training WorkflowExample: Pharma Iterative Training Workflow
DNN
Human
Legend
Crystal clear media
Sample dropped
Sample Swelling
Sample dissolving
Precipitate presented
Media Foggy
Iterative DNN training
Use trained DNN for fixing data labelling that follows "Pseudo-Label" technique
Visualize your data
25. Copyright © 2017 LUXOFT 25
1. Data coverage - needs to be representative - cover all cases (more than you
think)
2. Data balance (normalization) - about equal amount of data for each
class/case/scenario
3. Amount of data
Model Training: Time to Use Your Data!Model Training: Time to Use Your Data!
Acquire
Data
Prepare
Data
Six data guidelines that we consider important
…to achieve the best results (accuracy, false positives, false negatives).
4. Data formatting - make it work with existing/selected DNN architectures: Like
ROI selection, breaking big picture into smaller, video into frames, etc
5. Data synthesis and augmentation - “cat and mirror reflection of a cat are two
different cats”. Often easier to transform existing data than to obtain new data
6. Data annotation
26. Copyright © 2017 LUXOFT 26
Data Preparation and Annotation ScenariosData Preparation and Annotation Scenarios
Data Annotation
(Labeling)
Use Cases
Real / raw Manual Sometimes good choice for total greenfield. Need human
resources.
Tools dramatically increase the efficiency
Computer, then
human
Rudimentary AI provides draft annotation. Then humans
confirm/correct.
Example: house number captcha
Human, then
computer
Humans annotate 1st frame. Then existing CV methods provide
object tracking on subsequent frames. Very efficient for dynamic
scenes
Augmented Automated Use high quality labeled dataset and augment to simulate real-life
conditions. Example: “German Traffic Sign” dataset could have
been almost entirely synthesized trough augmentation.
Synthetic Automated Use 3D rendering software and scripts to generate scenes. Our
construction example.
27. Copyright © 2017 LUXOFT 27
Retail Example. Data Augmentation andRetail Example. Data Augmentation and
SynthesisSynthesis
Normal Light Conditions WB Deviations ISO Simulation
Year Synthesis
28. Copyright © 2017 LUXOFT 28
Railway Safety Example: Synthetic Data forRailway Safety Example: Synthetic Data for
TrainingTraining
30. Copyright © 2017 LUXOFT 30
Lessons LearnedLessons Learned
model
effort
data
effort
data
effort
31. Copyright © 2017 LUXOFT 31
• The most important decision is data strategy
• Data acquisition: Need lots of data, full coverage, well balanced.
• Model decisions: Best overall may be not the best for you.
• Pipeline decisions: Cascade and combine: Classic and AI algorithms.
• Training: A lot can be achieved by data preparation and
synthesis.
Annotation tools save millions.
• Be prepared to stop. AI development is a very iterative process
Lessons LearnedLessons Learned
32. Copyright © 2017 LUXOFT 32
• Our website: www.luxoft.com
• Our last year talk here at the Summit 2016 about computer vision pipeline
optimization:
• Available in full on Embedded Vision Alliance website:
https://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
ResourcesResources
33. Copyright © 2017 LUXOFT 33
Check out the demos at LUXOFT booth!Check out the demos at LUXOFT booth!
Extremely optimized artificial
intelligence and computer vision
pipelines running on low-power
embedded platforms and GPU
architectures: stereo vision, video and
image processing, DNNs; as well as
our Hybrid AI Platform that distributes
and manages both deep learning and
classic computation across cloud and
edge devices.
*photo of our last year booth
34. Copyright © 2017 LUXOFT 34
Alexey Rybakov
ARybakov@luxoft.com
LUXOFT
4400 Bohannon Dr Ste 235
Menlo Park, CA 94025
Thank you!
:)
Hinweis der Redaktion SF Training Tuning?
Traditional development is often a “one strong opinion” approach. Experts rely on about 50k-100k patterns [PE Ross 2006, Florida State]. But it is like going to a doctor - every one will suggest a different approach.
AI-driven is ideally(!) a 100% objective, data driven approach.
Jeff: This seems optimistic. In practice, doesn’t much depend on the (human) choice of AI algorithms, training data, and the like?
Alexey: will address verbally
This is a very fundamental shift, renders many value chains obsolete
even with the completely new vertical domain we can typically rely on the available community information from the other domain to make assumptions about target accuracy and other parameters
Learning Frameworks don’t do this (Digits / Intel Deep Learning SDK)
https://en.wikipedia.org/wiki/Accuracy paradox
https://en.wikipedia.org/wiki/Precision_and_recal
In this case coudlnt use
See ODT:
Change colors
1- 3 Datasets by using different resampling
Night cat vs day cat
Chose the best (#2)
Imagenet 2016 was won by Chinese team – including by manual shuffling
Pharma is YOLO + Alexnet
YOLO was tuned to very precise sub-pixel ROI
Top 1%
Epochs (training time)
Framework (like caffee / DIGITS) defines epoch
SF we need to explain this. Graph? mAP metric?
Intersection over Union
Problem: 100,000 classes. No network can handle this out of box
White = human error
SF use AP’s dataset here
Link?