1. Computer Vision with Humans in the Loop
David Jacobs (University of Maryland, College Park)
Introduction
BIOTRACKER:
Combines computer vision, state-of-the-art mobile phone technologies, and
internet
Encourage science enthusiasts to gather biological data
Help scientists to identify new species
Several projects under the large umbrella called BIOTRACKER
Clustering Images with Human in the Loop
Subclustering: summarizing large image databases
Odd Leaf Out: Computer game to identify labeling errors
And many others!
Active Image Clustering (Biswas and Jacobs)
Goal: Improve clustering performance, minimize total human effort
Cluster images with pairwise constraints (must-link and can’t-link) from humans
Main Contribution: Find the best image pair out of O(N2) possible image pairs
Look at the effect of each image pair on the overall clustering
Choose the pair for which the expected change in clustering is maximum
Experimental Results
Clustering performance is evaluated using Relative Jaccard’s Coefficient w.r.t ground
truth
We use two different domains (leaves and faces):
leaf dataset (subset of the database collected for Leafsnap)
face dataset (subset of Pubfig dataset)
(a) Leaf − 1042 (b) Face − 500
Active Subclustering (Biswas and Jacobs)
ACTIVE
SUBCLUSTERING
DIFFERENT
FINAL
SUBCLUSTERING
OUTPUT
PASSIVE
SUBCLUSTERING
Clustering large datasets is hard; even with human in the loop
Cluster only a subset of the data; useful in many applications
Odd Leaf Out (Hansen et al.)
Odd Leaf Out is an Online Game.
The game helps in refining Large Image Databases for Computer Vision Research.
Fun for players but useful information for vision researchers and biological enthusiasts.
Research Questions:
How do we build a game that is interesting, simple and useful?
How can we motivate users to continue to play when we are dealing with some
imperfect data that will sometimes provide two “correct” answers?
How do we choose the game elements (in Odd Leaf Out set of six images)?
How can data provided by novice users be employed to enhance the work of experts?
Game Design
Selection of Image Sets: We choose five images from one species and one from a
different one. We can create a set using each leaf in our database as a seed leaf (say
this is Li1 and is in species S). The other five leaves are chosen in the following way:
Seed Leaf Least Similar leaf
from seed leaf in S
(Li2)
A leaf from a different species other
than S (Lj); set difficulty depends on dis-
tance between Lj and Li1
Distinct randomly cho-
sen leaf from S (Li3)
Distinct randomly cho-
sen leaf from S (Li4)
Distinct randomly chosen leaf from S (Li5)
Different versions of the game: We have four versions of the game: Three Lives
version, Contestation, Multiple guesses, skip
Database: For all our experiments, we use the leaf dataset collected as part of a project
called Leafsnap. This is an iphone application developed by researchers in University of
Maryland, Columbia University and Smithsonian Institution. The iPhone application is now
available in Apple store !!
What Do We Get From This Game?
Identify errors in the dataset
Discover if color helps humans identify leaves (caution: Leaf color changes over the
year)
Feedback on how enjoyable or difficult the game is. Based on that we will improve our
game.
The game interface:
Example Cases
We give two sample scenarios which can happen if labels are wrong, however in
reality we see many other scenarios
When the Odd leaf is wrongly labeled it can be same as the other five leaves.
Players pick all the leaves with equal probability.
When one of the non-Odd leaves is wrongly labeled, there are two different looking
leaves.
Players pick the Odd leaf and the wrongly labeled leaves with equal probabilities.
About Biotracker
People in Biotracker: David Jacobs, Jennifer Preece, Derek Hansen, Dana Rotman,
Anne Bowser, Carol Boston, Yurong He, Arijit Biswas, Jen Hammond, Cynthia Parr and
many others!
Publications from Biotracker:
Arijit Biswas, David Jacobs. Active Image Clustering: Seeking Constraints from
Humans to Complement Algorithms. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2012.
Derek Hansen, David Jacobs, Darcy Lewis, Arijit Biswas, Jennifer Preece, Dana
Rotman, and Eric Stevens. 2011. Odd Leaf Out: Improving visual recognition with
games. In Proceedings of the IEEE International Conference on Social Computing.
Boston, MA.
Ahn J., Hammock J., Parr C., Preece J., Shneidernam B., Schulz K., Hansen D.,
Rotman D., He Y. Visually Exploring Social Participation in Encyclopedia of Life. ASE
International Conference on Social Informatics 2012.
Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C.S., Lewis, D.,
Jacobs, D. Dynamic changes in motivation in collaborative ecological citizen science
projects. CSCW 2012.
Rotman, D., Procita, K., Hansen, D., Sims Parr, C. and Preece, J. (2012), Supporting
content curation communities, The Case of the Encyclopedia of Life J. Am. Soc. Inf.
Sci..
Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David Jacobs, W. John Kress, Ida
Lopez, Joao V. B. Soares. Leafsnap: A Computer Vision System for Automatic Plant
Species Identification. European Conference in Computer Vision (ECCV), 2012.
Conclusion
Improved image clustering with humans in the loop
Clustering subset of a dataset
Finding Labeling errors in large image databases
Many other works are going on!
Acknowledgement: This work was supported by NSF grant #0968546.
University of Maryland, College Park email: arijit@cs.umd.edu WWW: http://biotrackers.net/