What Are The Drone Anti-jamming Systems Technology?
Â
Outliers and Inconsistency
1. Inconsistency
 and
 Outliers
Â
Ac#ve
 Learning
 by
 Outlier
 Detec#on
Â
Â
Inconsistency
 Robustness
 Symposium
 2011
Â
Neil
 Rubens
Â
Assistant
 Professor
Â
Â
Â
Â
University
 of
 Electro-ÂâCommunica#ons
Â
Tokyo,
 Japan
Â
2. Outline
Â
Inconsistency
 Robustness
 is
 a
 mul#-Ââdisciplinary
Â
issue.
Â
 We
 discuss
 some
 of
 the
 aspect
 of
Â
Inconsistency
 Robustness
 from
 the
 perspec#ve
Â
of
 Machine
 Learning:
Â
Â
âąâŻ What
 is
 Inconsistency
Â
âąâŻ Can
 Inconsistency
 be
 Useful
Â
âąâŻ Measuring
 Inconsistency
Â
4. Outlier
 Types
Â
âąâŻ Spa#al
 Outlier
Â
â⯠unlabeled
 data
Â
Â
Our
 Focus
Â
âąâŻ Model
 Outlier
Â
â⯠labeled
 data
Â
Â
5. Causes
 of
 Outliers
Â
âąâŻ Faulty
 data
Â
â⯠Entry
 error,
 malfunc#on,
 etc.
Â
âąâŻ Chance/Devia#on
Â
âąâŻ Incorrect
 Model
Â
Our
 Focus
Â
hQp://www.dkimages.com/discover/previews/
852/20223083.JPG
Â
6. Typical
 Treatment
Â
of
 Outliers
Â
âąâŻ Assume
 that
 the
Â
learned
 model
 is
Â
correct
 and
 discard
Â
points
 that
 donât
Â
agree
 with
 the
 model
Â
Â
7. Our
 Focus
Â
Atypical
 Treatment
 of
 Outliers
Â
âąâŻ Assume
 that
 data
 is
 right,
 and
 that
 the
Â
Â
model
 is
 wrong
Â
8. some tweaking. How
some tweaking. However, if
Moreover obtaining label
it should be changed signi
beled data is needed for per
labeled data is large enoug
problem as impractical. Wh
incompatability and keep m
Due to abundance of data
labeled data is rather scarc
Obtaining Data could be âCOSTLYâbe change
additional labeled data as to
it should
assumption that the current
incompatability and
Medicine: â
diagnosis: pain, time, $ x1
x2
drug discovery: $$$, time y
Practicality:
.
User Interaction: b
y
effort, time â
focus).
Practicality:
â
â
x1
x2
b
Due to abundance
y
y
Expertise Elicitation:
.
problem as impractic
$, time labeled data is rathe
labeled data is large
additional labeled dat
focus).
Moreover obtainin
beled data is needed
â
9. â â if some
problem as descent ... (except the number ofdata descent ...
x2 issue is exhorbated, in al settins This issue... exhor
outliers,issuemight be discarding most outliers,issue... exhor
it should be changed signiïŹcantly; instead of be changet
it should be changed signiïŹcantly; instead of be changet
gradient impractical. While the ulabeled samples we c
focus). Say why itâs an interesting problem: Say why of t
some tweaking. How
some tweaking. How
some tweaking. However, if the current model is inaccura
additional is inaccura
problem as impractic
gradient impractic
This phenomena occurs frequently during phenomena o
This phenomena occurs frequently during phenomena o
This we is exhorbated, in al settins This we might be
additional labeled data as to enable personalization (a comm
additional labeled problem: enable personalization (a comm
of mac
problem as impractical. While the ulabeled data is abunda
problem as is abunda
incompatability and keep making minor Moreover obtainin
Moreover obtainin
â d) Say what fol
[2]. learned model and/or existing data is refered to asa
ronment in which w
some tweaking. However, if the current model labeled dat
a) State the dat
of non-stationary en
the goal of machine learning isoftonon-stationary en
additional labeled pro
assumption is large
labeled be is large
overal, the is rathe
labeled very is rathe
focus). Not all itâs
of The learning process [7], [6], or in aThe learning accur
assumption that the j
Due to abundance of data; one may mistakenly dismiss t
labeled data is large enough; there may stilldata a need j
Due to abundance of data; one may mistakenly dismiss t
informative data poi
assumption that the current model is accurate, and requires c
assumption that the current model is accurate, and requires c
ronment in which changes may occur in y underlying mo
predictive model mo
â d) Say what follows from your solution: If we disc
labeled data is rather scarce. Even iflabeled data amount
labeled data is rather scarce. Even ifmake is data amount
ronment in which ch
which changes data. Data in the inconsistentch
beled data is needed for personaliizationâ data is needed
c) Say needed for personaliization c) Say needed
of .the learning process [7], [6], or in a.the learning proc
the goalan proc
This the early sta
====the early sta
beled data iswhat your solution achieves: ... data iswhat yo
Due to abundance
Due to abundance
ronment inmodel from the may occur that is underlying fro
overal, the small)
x1 x1 more info
incompatability and
incompatability and
labeled data is large enough; there may stilldata a need
it should ignoring
it should ignoring
Moreover obtaining labeled data could be expensive.
outliers expensive.
labeled bethat the
the learned model
x2 x2 is rather
Contributions
Contributions
Moreover obtaining labeled data could be are bad
y y informatio
x2 which is
in which is
Practicality:
Practicality:
. . and consis
b
y b
y
incompatability and keep making minor tweaks.
tweaks.
the outcom
learn
â â â
b)
outlier.
This
focus).
Practicality: Practicality:
the
beled ...
beled
ââ
â
â
â
â
â
x1
x2
â1
in
b
b
x
Due to abundance of data; one mayDue to abundance of data; one may m
mistakenly dismiss this
â
â
y
y
y
[2].
[2].
.
outliers
problem as impractical. While the ulabeled as impractical. While the ulab
problem data is abundant,
tal to learn
labeled data is rather scarce. Evenlabeled data is amountscarce. Even if
if overal, the rather of
unless o
labeled data is large enough; there may still be alarge enough; there ma
labeled data is need for
anomaly d
additional labeled data as tomodel isadditional and requires just model isperso
assumption that the current enable personalization (a common enable accu
accurate, labeled data as to
assumption that the current active le
focus). focus). ââ
some tweaking. However, if the current tweaking. inaccurate, if the curren
some model is However, learning. t
should be changed labeled data instead of ignoring signiïŹcantly; ofte
Moreover changed labeled AL: cou
it Moreover obtaining signiïŹcantly; could bebe obtaining La- needs toins
it should expensive. the data b
data as to
Unlabeled Data
beled data is needed keeppersonaliization tweaks. neededkeep personaliizatio
incompatability and for making minor ... and are ig
incompatability and for making minor
beled data is
Sampling
â
â
â ââ indeed con
if some
This issue is exhorbated, in al settins in issue is... http://je
This which exhorbated, in al settins
make is very small)
x1 x1 more info
This phenomena occurs frequently x2
x2 during the early stages new-physi
This phenomena occurs frequently d
is rather
a) State the
2. Bad
Contributions
Contributions
of y the learning process [7], [6], orof ythe non-stationary envi- [6], or in
in a learning process [7], informatio
outliers are bad
Practicality:
Practicality:
ronment in which changes may occur .in the underlying model may occur in
. ronment in which changes data consis
and includ
[2]. b
y [2].
b
y profession
the outcom
predictive
â
â ââ this here:I
====
â
b)
outlier.
This
focus).
Contributions
Practicality: Contributions
Practicality: May Be G
ââ
â
â
â
â
â
â
x1
x2
x1
gradient descent ... (exceptone mayDue to abundance... (exceptVersion of
Due to abundance of data; the number of samples we this one may m
gradient descent ofcan
mistakenly dismiss data; the numb
b
b
â
â
â
y
y
y
y
[2].
.
the
outliers
make is very small)
problem as impractical. While the problem very small)
make is data is abundant, â
ulabeled as impractical. While the ulab
tal to learn
labeled data is rather scarce. Even ifâ
â labeled datathe rather scarce. Even if o
overal, is amount of
unless o
labeled State is large enough; there may still bethe needenough; there ma
a) data the problem: labeled data is a problem:
a) State large for
b) Say why data as interesting problem:labeled itâs an to anomaly d
additional labeled itâs an to enable personalization (a common enable perso
additional Not all of as interesting pro
b) Say why data the
outliers are bad
focus). focus). are bad
outliers ââ
c) Say obtaining labeled achieves: be what your labeled data ofte type of
Moreover obtaining solution AL: cou
Moreover what your solution data could Sayexpensive. La-
c) achieve
Multiple Hypothesis Hypothesis/Model data is If we follows from f (x, â)
beled data iswhat follows from your solution: needed for personaliization
d) Say needed for personaliizationd) Say
beled ... Selection what discard and are ig
your so
10. assumption that the c
assumption that the current model is accurate, and requires jus
some tweaking. However, if the currentsome tweaking. Ho
model is inaccurate
it should be change
it should be changed signiïŹcantly; instead of ignoring th
incompatability and keep making minorincompatability and
tweaks.
â â
x1 x1
x2 x2
y y
. .
y y
â â
Practicality: Practicality:
b b
.
.
Due to abundance
Due to abundance of data; one may mistakenly dismiss thi
[2].
y
y
â
Little is learned â
y
y
â
b
b
x2
x1
â
â
â
x1
â
â
problem as impractical. While the ulabeled data as abundant
problem is impracti
the
focus).
c)
====
labeled data is rathe
labeled data is rather scarce. Even if overal, the amount o
labeled data is large
with some data
(irregardless of the output values)
labeled data is large enough; there may still be a need fo
Consistent Sample
Inconsistent Sample
additional labeled data as to enable personalization labeled da
additional (a commo
Practicality:
Practicality:
beled # of hypotheses
focus). focus).
Will not agreebeled of the hypotheses
Contributions
Moreover obtaining labeled data could Moreover obtaini
be expensive. La
additional labeled
assumption that the current model is accurate, and requires jus
Due very small)
...
some tweaking. However, if the current beled data inaccurate
beled data is needed for personaliization model is is needed
â â
it should be changed signiïŹcantly; instead of ignoring th
which ...
incompatability and keep making settins tweaks.issue is exho
This issue is exhorbated, in al minor inThis
a) data the problem:
â This phenomena
This phenomena occurs frequently during the early stage
non-stationary envi
ofxthe learning process [7], [6], or in aof the learning proc
1
ronment in which changes may occur in ronment in which ch
x2 the underlying mode
[2].
y [2].
.â â
yContributions Contributions
âgradient descent ... (except the number gradient descent ..
of samples we ca
Does not allow to reducedata be needed for personaliization ...
make is very small)
outliers, weis needed for personaliization ...
make is very small)
Practicality:
b
â â
.
Due to abundance of data; one may mistakenly dismiss thi
y
y
â
b
x2
â
â
incompatability and keep making minor tweaks.
problem State the problem:
a) as impractical. While the ulabeled data State the pro
a) is abundant
the
focus). Say what your solution achieves: focus).
labeled
labeled data why itâs an interesting if overal, the amountitâs
b) Say is rather scarce. Even problem: Not all of th
b) Say why o
This issue is exhorbated, in al settins in which ...
labeled data bad large enough; there may still be a bad fo
outliers are is outliers are need
Inconsistent Sample
c) Say what your solution achieves:
additional labeled data as to enable personalization (a what yo
c) Say commo
beled data is
samples we
d) d) Say what fo
focus). Say what follows from your solution: If we discar
outliers, we might b
outliers, we might be discarding most informative data point
Moreover obtaining labeled data could be expensive. La
d) Say obtaining labeled your solution: If we discard
labeled Say whylarge an interesting problem: Notaall of the
of x2 learning process [7], [6], or in a non-stationary envi-
be expensive. La-
additional labeled data as to enable personalization (a common
labeled data is large enough; there may still be a need for
labeled data is rather scarce. Even if overal, the amount of
problem as impractical. While the ulabeled data is abundant,
Due to abundance of data; one may mistakenly dismiss this
Moreover what follows from data could be expensive. La-
b) data is itâs enough; there may still be data is for
labeled amount of
assumption that the
====
beled data is needed for personaliization ...====
Due to abundance
Number of hypotheses is reduced needed
âThe goal of machine learning is to The goal accurat
learn an of ma
predictive model from the data. Data that is inconsistent wit
This issue is exhorbated, in al settins predictive...
in which model fro
â
â
â
â
the learned model occurs frequently duringlearned model a
This phenomena and/or existing data is refered to stage
the the early as
outlier.
ââ
ââ
ââ
of the learning process [7], [6], or in a outlier.
non-stationary envi
active
f (x, â)
â â
ronment in which changes may occur in the underlying mode
[2].Learned model is often assumed to be Learned model cor
approximately is
problem as impractical. While the ulabeled data is as impractical. While the
and consisten
and consisten
outliers are
outliers are
May Be Good
professionally
unless obje
if some po
assumption that the current model is accurate, and requires just current model is
make isto abundance of data; one may mistakenly dismiss this of data; one m
it should is changed signiïŹcantly; instead should be changed signiïŹcantly
This phenomena occurs frequently during x1 early stages new-physics.h
the outcomes)
ââ learn
labeled State is rather scarce. Even if overal, the data is rather scarce. Even
tal to learning
Moreover obtaining labeled data could some tweaking. However, if the cu
is rather limi
Moreover obtaining labeled data
more informa
AL: often c
might be discarding most informative data points for personaliiz
the here:It Tu
this outcomes)
type of outl
problem abundant, tal to learning
rect, therefore using
it of ignoring the needs to be la
rather limi
more informa
AL: often c
ronment in which changes may occur in the underlying model data including
is 2. Bad data
if some poi
information is
anomaly detec
need large enough; there
incompatability and keep making m
some tweaking. However, if the current model is inaccurate, learning. typic
indeed contain
information is
outliers are bad data as to enable personalization (a common anomaly detec
additional labeled data as to enable p
and are ignore
gradient descent ... (except the number of Practicality: can Version of Tru
indeed contain
http://jeffjon
unless objec
and are ignore
13. Model Selection
(a) under-ïŹt (b) over-ïŹt (c) appropriate ïŹt
Figure 8: Dependence between model complexity and accuracy.
If
 there
 is
 no
 inconsistency
 between
 the
 training
 and
 tes#ng
 data
 then
Â
 the
 most
 complex
 model
 would
 tend
 be
 selected.
Â
14. Change
 Detec#on
 /
 Model
 Correc#on
Â
Â
Is
 inconsistency
 caused
 by
 noise
 (or
 minor
Â
factors)
 or
 by
 changes
 in
 the
 underlying
 model
Â
â⯠Applica#ons:
Â
Â
medical
 diagnos#cs,
 intrusion
Â
detec#on,
 network
 analysis,
Â
ïŹnance
Â
hQp://www.sa#magingcorp.com/galleryimages/high-Ââresolu#on-Ââlandsat-Ââsatellite-Ââimagery-Ââoman.jpg
Â
15. Conclusion
Â
âąâŻ Inconsistency
 could
 be
 useful
 for:
Â
â⯠Hypothesis
 Learning
Â
â⯠Model
 Selec#on
Â
â⯠Model
 Correc#on
Â
Neil
 Rubens
Â
Assistant
 Professor
Â
Ac#ve
 Intelligence
 Group
Â
Laboratory
 for
 Knowledge
 Compu#ng
Â
University
 of
 Electro-ÂâCommunica#ons
Â
Tokyo,
 Japan
Â
hQp://Ac#veIntelligence.org
Â