ODSC London 2018

1
ODSC, London. Sep, 2018
Inside the Black Box:
How Does a Neural Network Understand
Names?
Kfir Bar, Chief Scientist, Basis Technology

2
Automatically find names of people,
organizations, locations, and more in text
across many languages.
Named entity recognition (NER)

According to Elon Musk,
Mars rocket will fly ‘short
flights’ next year.
3

5
Context is important
Edward Adelson
Neuroscientist, MIT
Checker shadow illusion
The squares represented by A and B
are of the same color

6
Context is important
Edward Adelson
Neuroscientist, MIT
Checker shadow illusion
The squares represented by A and B
are of the same color

Can't play Spain? Improve your
playing via easy step-by-step video
lessons!
7
But sometimes it gets ambiguous...

8
Can't play Spain? Improve your
playing via easy step-by-step video
lessons!

Mom is a great TV show
9

Mom is a great TV show
10
Mother

➔ Processing one word after another
➔ Assigning label to each word, based on local as well as global features
➔ Labels are B-PER, I-PER, B-LOC, I-LOC, OTHER, etc. (a.k.a IOB)
I/O am/O working/O for/O Basis/B-ORG Technology/I-ORG
11
NER as a sequence-labeling problem

12
Use multiple engines
Dictionaries
Rule-based engine
AI-based engine
Decisions

Traditional ML vs. Deep Learning
I love this movie
words, part of speech tags,
lemmas, brown clusters
[00010010110000101001…..001]
☺ Positive
Feature extraction
Vectorization
Modeling
I love this movie
Embeddings lookup
[0.323, -0.3434, 0.901, …, -0.267]
[-0.4923, 0.554, 0.001, …, -0.365]
[1.58845, 0.478, 0.0901, …, -0.171]
…
[-0.0592, 0.588, -0.01, …, -0.111]
Modeling
☺ Positive
13

Word embeddings
- + BerlinJapan Germany
German
European
Europe
Africa
Tokyo =

15
Feed forward network for NER
listen
to
while
I
Natural Language Processing (Almost) from Scratch (Collobert et al., 2011)
B-PER
B-LOC
...
...
Layer 1 Layer 2 Output
Spain I-PER
...

16
Recurrent neural network (RNN)
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...

17
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...

18
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...

19
➔ At each time step we
process one word
concatenated with
the output from
previous time steps
➔ It remembers information
for many time steps

20
t-1 t t+1
B-PER I-PER OTHER
➔ At each time step we
process one word
concatenated with
the output from
previous time steps
➔ It remembers information
for many time steps

21
Long Short Term Memory (LSTM)
LSTMIt can forget information when
necessary
LSTM LSTM
t-1 t t+1
B-PER I-PER OTHER

22
LSTM for Sequence Labeling
LSTM
Washington
B-PER
LSTM
said
OTHER
LSTM
in
OTHER
LSTM
Chicago
B-LOC
LSTM
last
OTHER
...

+
23
Bidirectional LSTM for Sequence Labeling
Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al., 2015)
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
...

24
Multilayer LSTM for Sequence Labeling
+
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +

25
Multilayer LSTM for Sequence Labeling
+
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +

+
26
Alternative decoding using Conditional Random Fields (CRF)
LSTM
Washington
LSTM
+
LSTM
said
LSTM
+
LSTM
in
LSTM
+
LSTM
Chicago
LSTM
+
LSTM
last
LSTM
...
B-PER OTHER OTHER B-LOC OTHER

+
27
Alternative decoding using Conditional Random Fields (CRF)
LSTM
Washington
LSTM
+
LSTM
said
LSTM
+
LSTM
in
LSTM
+
LSTM
Chicago
LSTM
+
LSTM
last
LSTM
...
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER

28
Decoding with CRF
The global score of
a specific sequence
of labels
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER

29
Decoding with CRF
The global score of
a specific sequence
of labels
T [O, I-PER] < T [B-PER, I-PER]

30
Decoding with CRF
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
OTHER
I-LOC
B-LOC
I-PER
B-PER
argmax

+
31
Character encoding
LSTM
Washington
LSTM
+
LSTM
said
LSTM
+
LSTM
in
LSTM
+
LSTM
Chicago
LSTM
+
LSTM
last
LSTM
...
B-PER OTHER OTHER B-LOC OTHER
+
s a i d

32
Character encoding results
*Results are F score measured over Basis’ evaluation set
English Arabic Korean
BiLSTM 83.5 80.3 82.3
BiLSTM+Char 85.1 82.5 86.0

33
Char encode, word encode, decode
Char encoding
Word encoding
Decoding
Washington said in Chicago last...
Labels

34
Reported combinations
Char encoder Word encoder Decoder
Collobert et al. (2011) None CNN CRF
Mesnil et al. (2013) None RNN RNN
Nguyen et al. (2016) None RNN GRU
Huang et al. (2015) None LSTM CRF
Lample et al. (2016) LSTM LSTM CRF
Chiu & Nichols (2016) CNN LSTM CRF
Zhai et al. (2017) CNN LSTM LSTM
Yang et al. (2016) GRU GRU CRF
Strubell et al. (2017) None Dilated CNN CRF
Shen et al. (2018) CNN CNN LSTM
Borrowed from Shen et al. (2018)

35
What does LSTM actually learn?

36
By Siddhartha Mukherjee
The dying algorithm - predicts death
for oncological patients
“Here is the strange rub of such a deep
learning system: It learns, but it cannot
tell us why it has learned…
...the algorithm looks vacantly at us
when we ask, Why? It is, like death,
another black box.”
Jan 2018

+
37
Bidirectional LSTM for NER
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
...

+ + + ++
38
LSTM
Washington
B-PER
LSTM
LSTM
said
OTHER
LSTM
LSTM
in
OTHER
LSTM
LSTM
Chicago
B-LOC
LSTM
LSTM
last
OTHER
LSTM
...

+ + + ++
39
LSTM
Washington
B-PER
LSTM
LSTM
said
OTHER
LSTM
LSTM
in
OTHER
LSTM
LSTM
Chicago
B-LOC
LSTM
LSTM
last
OTHER
LSTM
...
Let’s look at this cell vector over time
...

40

41
Neuron 280 - gets positive around some punctuation marks

42
Neuron 189 - gets negative around potential locations

Thank you!
43
Questions?
kfir@basistech.com
@kfirbar

ODSC London 2018

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie ODSC London 2018

Ähnlich wie ODSC London 2018 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ODSC London 2018