Human-Efficient Discovery of Training Data for Visual ML

Human-Efficient Discovery of
Training Data for
Visual Machine Learning
Thesis Proposal
Ziqiang (Edmond) Feng
Committee:
Mahadev Satyanarayanan (Chair)
Martial Hebert
Roberta Klatzky
Padmanabhan Pillai (Intel Labs)

Agenda
• The Problem
• Thesis Statement
• Overview of Eureka
• Research Thrusts
• Related Work
• Timeline
2

Deep Learning for Computer Vision
Classification Detection
Segmentation Activity Recognition
 Bird
 Cat
 Dog
3

Training Data Is A Key Ingredient
Raw
pixels
Deep Neural Network
Predictio
n
(1) Forward pass
Label
(ground truth)
⊗ErrorDeep Neural Network
Predictio
n(2) Backward pass (aka back-propagation)
A training example = ( , )
Raw
pixels
Label
(ground
truth)
4

Deep learning needs
training data
a lot of
accurately labeled
5

Some Famous Labeled Training Sets
Training Set Size
Object recognition/detection
ImageNet 1,200,000 images
COCO 330,000 images
PASCAL VOC 12,000 images
Activity recognition
UCF101 13,000 videos
Charades 10,000 videos
6

7
Source: The New York Times 11/25/2018

The Training Data Problem of Domain
Experts
(scientists, military, medical doctors, etc.)
Masked palm
civet (Paguma
larvata).
Transmitter of
SARS during its
2003 outbreak
BUK-M1.
Believed to
have shot
down MH17
and killed 298,
2014.
Nuclear atypia
in pathological
images. Cue of
several
diseases and
cancers. 8

Why Is It Difficult?
Crowds are not experts.
Domain-specific expert knowledge is
required.
Interesting phenomena are rare.
Scan through a lot of data to find a few
positives.
Access restriction.
Only one or a few experts can label the
9

How can a single domain expert discover
thousands of positive examples of a rare
object from unlabeled data efficiently?
10

Thesis Statement
The manual effort of discovering a large training set for visual machine
learning can be reduced by a system combining:
• Early discard
• Just-in-time machine learning
• The ability to create more accurate filters without writing new code
11
This approach is efficient in:
• Different computing landscapes
(e.g., edge computing and smart storage)
• Different problem domains
(e.g., object detection in images and activity recognition in videos).

Agenda
• The Problem
• Related Work
• Timeline
12

Eureka
A methodology and a system.
• For finding rare phenomena in unlabeled visual data
Goal: utilize an expert’s time efficiently
• Reduce expert’s idle time
• Improve candidate examples’ quality
13

Itemizer
(scoping)
Data Source
(images, videos, map data, etc.)
User Interface
Item: independent unit of
early-discard and display
(e.g., a single image)
c kAttribute: key-value attached by
filters; facilitates communication
between filters and post-analysis
Filter: examine and try to drop
items; short-circuit evaluation of
cascade filter chains
dcba
jihg
Item Processor
F2
drop
drop
e
f k
F1
Logical Execution
Flow
14
…

Early Discard Filters
• Purpose: to drop probably negative data and narrow
search space
• Not be to taken as “perfect detector”
• Reduce the demand for expert time & attention
• Examples:
• Sky-blue color for birds
• Bullet shape for rocket propelled grenade (RPG)
15

Creating a Difference of Gaussian (DoG)
Filter
16

Early Discard: Finding Deer
17
Early-discard filters
Dropped
99.29%
Passed 0.71%

Eureka’s Iterative Workflow
18
Explicit features, manual
weights (RGB histogram,
SIFT, perceptual hashing)
Explicit features, learned
weights (HOG + SVM)
Shallow transfer learning
(MobileNet + SVM)
Deep transfer learning
(Faster R-CNN finetuning)
Deep learning
100 101 102 103 104
Number of Examples (log scale)
Accuracy(nottoscale)
Just-in-time Machine Learning
(use just-collected examples to retrain ML models)

Agenda
• The Problem
• Related Work
• Timeline
20

Eureka in Different Computing
Landscapes
21
Edge Computing Cloud Computing Smart Storage
Image Data
Focus:
• Computer system efficiency (throughput, latency,
etc.)
• Identify hardware and software bottleneck
• Develop techniques to improve computational
efficiency

Eureka in Different Problem Domains
22
Edge Computing
Image Data
Focus:
• Domain-specific optimization
• Expressive programming abstraction
• User productivity
Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)

Research Thrusts: Progress
23
Edge Computing
Image Data Video Data
Other multi-
map)
Cloud Computing Smart Storage

24
Edge Computing
Other multi-
map)

Why the Edge?
• Data is generated on the edge
• Sensors, cameras, smart phones,
drones, self-driving cars, smart
streetlights, etc.
• Edge computing is the answer to
scalability
• Can’t afford to send all data into the
cloud for computation
• US average Internet bandwidth
(2017) = 19 Mbps
• Barely enough to stream a 4K video
by Netflix
25

System Architecture
26
Expert with
domain-specific
GUI
cloudlet
Archival
Data
Source
LAN
cloudlet
LAN
cloudlet Live
Video
I
n
t
e
r
n
e
t
Archival
Data
Source
LAN
Executes early-discard code to
drop probably irrelevant data
Only a tiny fraction of data along with
extracted information is transmitted to
user, consuming little Internet
bandwidth.

Experiments
27
• Yahoo! Flickr 100 Million (YFCC100M) images
• Unlabeled. Real-life object distribution.
• Evenly partitioned on cloudlets
Data
• 8 cloudlets
• Intel Xeon E5-1650, 32 GB DRAM
• Nvidia GTX 1060 GPU
Cloudlets
• 5 iterations for each target
• Start with SIFT, RGB color histogram, Difference of Gaussian, …
• Later: iteratively re-train SVM using MobileNet features
Workflow

Edge + Image: Results
28
Deer Taj Mahal Fire hydrant
0.07% 0.02% 0.005%Estimated base rate
(Prevalence)
111 105 74Total true positives
collected in 5 iterations
7,447 4,791 15,379Images labeled
by user
2,104,076 2,542,889 2,734,070Images discarded
by Eureka

Compare with Naïve Hand-Labeling
1,000
10,000
100,000
1,000,000
Deer Taj Mahal Fire hydrant
Images (TP+FP) the user inspected to collect ~100 true positives
Naïve hand-labeling Single-pass early discard Eureka
better

Effect of the Iterative Workflow
30
0
5
10
0 20 40 60 80
Cumulative Minutes in Workflow
Newly-discovered True Positives Per Minute
Deer (Base rate=0.07%) Taj Mahal (0.02%)
Fire Hydrant (0.005%)
Due to lower
base rate of
target
Bottlenecked by low
base rate and
computation (user’s
waiting)
better

Proximity to Data Is Important
31
0
500
1000
10 Mbps 25 Mbps 100 Mbps 1 Gbps (LAN)
ProcessingThroughput
(images/s)
Throttling bandwidth between compute and data.
RGB color histogram filter
US average:
18.7 Mbps (2017)
better

32
Edge Computing
Other multi-
map)

Why the Cloud?
• Historically, many data sets have been centralized into the
cloud
• Elasticity -- easy to recruit more compute resource by
adding VMs
• Trade off $$$ for better use of expert time
33

Edge vs. Cloud
34
EC2 EC2 EC2 EC2
Network
S3 Storage
Edge
• Independent CPUs and disks
• Access to local disk is fast
Cloud (Amazon Web Services as example)
• Elasticity leads to separation of
computing and storage layers
• I/O stack adds extra latency
• Contention for shared
bandwidth

Edge vs. Cloud Results
0
500
1 2 3 4 5 6 7 8
Throughput(images/s)
Servers
SIFT filter
(expensive, compute-bound)
Edge Cloud (AWS)
35
0
10000
1 2 3 4 5 6 7 8
Throughput(images/s)
Servers
RGB color histogram filter
(cheap, data-bound)
Edge Cloud (AWS)
better
Fast (cheap) filters manifest I/O latency in the cloud.

What Can We Do in the Cloud
• Use extra threads to pre-fetch data asynchronously
• Utilize the many cores
• In practice -- got throttled by the service provider
• Cache data for later re-access
• Utilize the large main memory
• Useful if workload revisits data items
36

37
Edge Computing
Other multi-
map)

Eureka + Smart Storage
(work in progress)
Smart storage = execute application logic in on-disk controllers
• Today’s disk controllers are already small computers
Why do this?
• Storage is the first thing that scales with data
• Lower energy consumption
• Fast access to data
• Passing application knowledge to the storage system for optimization
Challenges
• Low compute capacity on device
• Difficulty of programming/debugging
38

Optimizing Image Storage for Eureka
• Object store semantics  no need for partial reads
• Read only  no writes
• Read order doesn’t matter  reduce disk seek, exploit
cache, etc.
39

40
Edge Computing
Other multi-
map)

Activity Recognition in Video
(work in progress)
• Challenge 1: the extra time dimension
• Both algorithmic and computational
41

Eureka + Video Data
• Challenge 2: gap between available training sets and real-
world data
42
UCF101 data set (Soomro et al.) Surveillance Video on Forbes Avenue near CMU

Eureka + Video Data
• Challenge 3: complex search conditions
• “Search for men”
• “Search for a man wearing red shirt running after a child on
street side”
• Features needed
• Combining techniques for video (frame sequence) and frame
(static image)
• Nesting detection
• Correlating timestamp and location
• Feeding back Eureka’s iterative approach
43

Other
Multidimensional
Data
• Examples
• Whole-slide image pyramids in
digital pathology
• Map data
• Challenges
• Query interface
• Efficient computation
• What can be discovered from the
data?
Let’s build the telescope so that
domain experts can discover craters
44

Agenda
• The Problem
• Related Work
• Timeline
45

Related Work:
DNN Training and Inference
• DNN structures
• AlexNet, VGG, Inception, MobileNet, Faster R-CNN …
• Software libraries
• TensorFlow (Google), PyTorch (Facebook), …
• Hardware accelerator
• Movidius (Intel), TPU (Google), …
• On constrained hardware
• Model compression, model quantization, …
46
• Video analytics
• VideoStorm, NoScope, Focus, FilterForward, BlazeIt, VideoEdge,
…
• Premise: a “good” model exits to detect object of interest
• Ask: “how to run it faster”?
• Batch or stream processing
• My thesis is intrinsically human-in-the-loop and interactive, and
has no good model to begin with

Related Work:
Training Data Augmentation and
Synthesis
• Traditional data augmentation
• More intelligent data synthesis based on computer
graphics and machine learning
47
Source: Dwibedi et al.
ICCV’17
Problem:
Not truly diverse examples.

Related Work:
Human-sourced Labeling
48
Crowd-sourcing
• Only useful for common targets
• Human-computer interaction, active learning,
etc.
(CVPR’13, ECCV’16, etc.)
Target Visual
Data
Target
Domain
Experts
No Coding
Barrier
Expert/crowd-sourcing
• Medical literature screening
(HCOMP’15, etc.)
Snorkel
• Ask experts to write “labeling functions”
• Infer labels using statistical models
(NeurIPS’16, VLDB’18, etc.)
My Thesis Proposal

Thank you.
Questions?
50
Video demo:
https://youtu.be/Ajo0APnSV10

Human-Efficient Discovery of Training Data for Visual ML

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Human-Efficient Discovery of Training Data for Visual ML

Ähnlich wie Human-Efficient Discovery of Training Data for Visual ML (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Human-Efficient Discovery of Training Data for Visual ML

Hinweis der Redaktion