SherlockNet

SherlockNet:
Tagging and Captioning the British Library
1 Million Image Dataset
Brian Do, Karen Wang, Luda Zhao

The SherlockNet Team
Luda
BS in CS,
MS in CS @
Stanford
Brian
BS in CS @ Stanford,
MD-PhD student @
Harvard/MIT
Karen
BS in CS,
MS in CS @
Stanford

Problem: Current Tags Not Very Useful

Solution: Use Neural Networks to Generate
Descriptive Tags For Every Image
building
architecture
temple
pyramid
stone
etc...

Roadmap
- What are Neural Networks?
- The First Pass: Category Classification
- Lots of Tags: OCR Tagging Using Related Images
- Image Captioning
- Presenting: SherlockNet Interface

Problem: Classifying Inputs Into Categories
Photo Credits: Andrej Karpathy, CS 231N, Stanford University

The Input is Passed Through Multiple Layers

Convolutional Neural Networks (CNNs) are
Optimized for Images as Input

We trained a CNN to classify all 1M images into
one of 12 categories
people: 0.80
architecture: 0.12
diagrams: 0.05
object: 0.02
decoration: 0.01
Confidence
percentage

81% top-1 accuracy
97% top-3 accuracy

Can we use surrounding text to
get better tags (Google Images approach)?

number
writers
broods
season
sparrow
hedge
bird
species
colouring
breast
Optical
Character
Recognition
Can we use surrounding text to
get better tags (Google Images approach)?
person
top
head
cheeks
spot
ring
part
back
rump
throat

We “vectorize” images and minimize Euclidean
distance to obtain related images
CNN
<3,4,-1,-3,4>

We “vectorize” images and minimize Euclidean
distance to obtain related images
CNN
<3,4,-1,-3,4>
<2,4,-1,-5,3>
<-3,2,5,3,-3>
<-1,0,0,5,1>
<3,3,0,-3,5>
D = 6
D = 161
D = 106
D = 3

birds
trees
london
park
stick
planting
wing
claws
beaks
nuts
wings
pacific
species
people
rainbow
pairing
birder
park
wing
descriptions
species
trees
beak
london
perching
eyes
bird
benches
We pooled surrounding text from similar images

bird
tree
london
park
stick
plant
wing
claws
beak
nuts
wing
pacific
species
people
rainbow
pair
bird
park
wing
description
species
tree
beak
london
perch
eye
bird
bench
We then “stemmed” words + did spell check

We then had similar images “vote” on tags
bird
tree
london
park
stick
plant
wing
claws
beak
nuts
wing
pacific
species
people
rainbow
pair
bird
park
wing
description
species
tree
beak
london
perch
eye
bird
bench

bird
park
wing
species
beak
+ =
This makes the tags for each image much cleaner
and more refined

Motivation
- Most natural way of showcasing
images
- Opportunities to provide
contextual information, “the man
next to the woman”
- From AI research standpoint:
interesting theoretical challenges

Background
- Combining two distinct
neural networks (CNNs and
RNNs) to do end-to-end
processing
- Very active area of research

Challenges
- High quality photographs vs.
low-res, black & white
illustrations
- Ambiguity in detail levels
- Difficulty in obtaining ground
truth data(“machines can’t
learn without prior knowledge”)
vs
.
???

Not quite sure what’s happening

A New Dataset
- British Museum Prints and Drawings Collection
- Over ~200,000 images through public interface
- Many have good, human-annotated captions
- Potential for machine learning research?
(From www.britishmuseum.org Online catalogue)

Training tags from captions:
.
.
.
Bridges
Ships
We can extract informative tags from our captions

SherlockNet will one day provide multiple levels of
high-quality text annotation for every image
Tags: Architecture, landscape, river, trees, boat
Caption: A boat on a tree-lined river in front of a building

Acknowledgements
The British Library
Mahendra Mahey
Adam Farquhar
Hana Lewis
Adrian Edwards
Elliot Crowley
Mario Klingemann
Ben O'Steen
Stanford University
Andrej Karpathy
Justin Johnson
Stefano Ermon
The British Museum

Neural networks reveal image features that become
more or less frequent over time
Feature #541 is highly activated in modern decorations compared to antique decorations

Images with high
score for Feature #541
Images with low

Images with high
Images with low
Feature #541 probably indicates the presence of lines
delineating the top and bottom of the decoration.

Process + Results
- Decorations: 64% accuracy, Maps: 52% accuracy (compared to 16%/
20% accuracy random chance, respectively)
- Pretty good results given inherent limitations!

SherlockNet’s contact details
Karen Wang - kwang37@stanford.edu
Luda Zhao – ludazhao@stanford.edu
Brian Do - brian_do@hms.harvard.edu

SherlockNet

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie SherlockNet

Ähnlich wie SherlockNet (20)

Mehr von labsbl

Mehr von labsbl (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SherlockNet

Hinweis der Redaktion