Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN

Twenty Billion Neurons
Berlin & Toronto based Video Understanding Company

DOMESTIC COMPANIONS AUGMENTED REALITY
AUTOMOTIVE
(10M cars)
(85M smart cameras) (6M AR glasses)
COLLABORATIVE ROBOTICS
(150M cobots)
SMARTPHONE APPS
(3 BN phones)
All figures are estimated number of devices in 2020
By 2020:
(CONSUMER VIDEOS)
(80% of Internet Traffic)
Sources: KPCB, Barclays

2012 2014 2016 2017
“Neural networks
can’t do
image
classification”
“Neural networks
can’t
translate text”
“Neural networks
can’t play Go”
“Neural networks
don’t have
common sense”
1986
“Neural networks
don’t work”
?

At TwentyBN we build the brain that allows cameras to see
Prof. Yoshua Bengio
Scientific Advisor
Professor at MILA Montréal;
noted for his pioneering work
on deep learning
Valentin Haenel
VP Engineering
Co-initiator of PyData Berlin;
contributor in more than 50
open source projects
Nathan Benaich
Advisor
VC investor, technologist,
former scientist; Organizer of
London.ai and RAAIS
+ 13 full-time staff, including AI researchers, engineers and product people
Roland Memisevic
15+ years experience in DL as
Professor (MILA Montreal) &
PhD student of Geoff Hinton
CEO & Chief Scientist
Moritz Müller-Freitag
COO & Head of Product
Experience as Professor (FH
Münster) & principal software
architecture (XING AG)
Experience as data scientist
(Eleven) & country manager
(Savedo/HitFox Group)
Ingo Bax
CTO
Christian Thurau
CBDO
Experience as Co-founder, CTO
(Game Analytics, exit) &
researcher (Fraunhofer)

Research &
engineering
Data
platform
Integrated technology stack
1 2
Embedded
real-time net
3
Solutions
4

● RGB (for example, cheap, built-in laptop camera)
● Recognizes 25 hand gestures
● Very high accuracy
● Runs in real-time on a laptop using RGB camera input
● Require depth sensor devices
● ~5 gestures
● Low accuracy
● Never gained traction
Camera based gesture control
Existing solutions
TwentyBN solution
Note: Click picture for video

Variations
Camera angles and scene layouts
Multi-person actions and
localization
Interactivity
Complex object interactions

Indoor activity monitoring
Output: “Person picking
[something] up”
Output: “[Something] falling
like a feather or paper”
Output: “Person leaving
through a door”
Output: “Bending [something]
until it breaks”
Output: “Trying to bend
[something unbendable] so
nothing happens”
Output: “[gesture] Zooming
Out With Two Fingers”

We support all stages of our clients’ product cycles
Softcore IP
Data licensing
Software licensing
Hardware licensing
Product Description
Software that adds video
capabilities to your
product
High-quality labeled videos
customized to support
your video applications

20BN-JESTER
A crowd-acted dataset of generic human
hand gestures.
Number of Videos: 148.094
License: Free for academic use
(Creative Commons Attribution 4.0
International license CC BY-NC-ND 4.0)
https://www.twentybn.com/datasets/jester

20BN-SOMETHING-SOMETHING
A crowd-acted dataset of basic interactions
with everyday objects.
Number of Videos: 108.499
License: Free for academic use
(Creative Commons Attribution 4.0
International license CC BY-NC-ND 4.0)
https://www.twentybn.com/datasets/something-something

Contrastive classes make learning harder and networks stronger
Tearing [something] into two pieces VS Tearing [something] just a little bit 0.74 (0.52)
Pretending to pick [something] up VS Picking [something] up 0.86 (0.75)
Pretending to pour VS Pouring 0.82 (0.64)
Pouring with overflow VS Pouring without 0.76 (0.54)
Pretending to put [something] onto VS Putting [something] onto [something] 0.82 (0.64)

Mistaken “opening” predictions
Ground truth: Moving [part]
of [something]
Prediction: Opening
[something]
Ground truth: Unfolding
[something]
Ground truth: Putting
[something] on a flat surface
without letting it roll
Prediction: Opening
[something]
Prediction: Opening
[something]

Mistaken “covering” predictions
Ground truth: Putting [something] in
front of [something]
Prediction: Covering
[something]
Ground truth: Turning [something] upside
down
Prediction: Covering
[something]

Roland Memisevic
+1 416 826 1032
roland@twentybn.com
www.twentybn.com

Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN

Ähnlich wie Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN (20)

Mehr von AI Frontiers

Mehr von AI Frontiers (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBN