Presentation for Cognitive Systems Institute Group Speaker Series call on October 15, 2015. Elham Khabiri is a Researcher at the IBM TJ Watson Research Center.
Scaling API-first – The story of a global engineering organization
Domain Scoping for Subject Matter Experts by Elham Khabiri
1. Domain
Scoping
for
Subject
Ma4er
Experts
Elham
Khabiri,
Ma4hew
Riemer,
Fenno
F.
Heath
III,
Richard
Hull
Oct
15
2. IntroducFon
• Exploring
Social
Media
is
essenFal
for
SMEs
– To
discover
relevant
content
around
a
subject
– To
analyze
senFment
about
a
subject
– Example:
Possible
changes
to
the
common-‐
core
by
government,
how
it
is
reflected
in
social
media
– What
are
the
vocabularies
that
arFcles,
media,
use
to
address
relevant
discussions
• Provide
a
tool
to
define
scope
of
vocabulary
– Suggest
SMEs
what
vocabularies
to
search
for
in
the
news
and
social
media
– Construct
“domain
model”:
family
of
vocab
and
extractors
– Different
Algorithm
and
datasets
are
offered
by
the
tool
• Dataset:
Common
Crawl,
BoardReader
News
and
Forums,
Google
News
• Methods:
CollocaFon,
TFIDF,
NN
(Word2Vec
and
Glove)
4. Different
Methods
and
Datasets
Generates
terms
that
occur
frequently
with
one
or
more
of
the
seed
terms
with
a
frequency
that
is
relaFvely
high
as
compared
with
how
ocen
they
occur
in
“all”
documents.
5. Different
Methods
and
Datasets
Generates
terms
that
are
“similar”
to
the
seed
terms.
The
similarity
metric
is
based
on
an
analysis
of
a
large
family
of
news
arFcles
from
2013,
that
were
gathered
by
Google.
1M
unigrams,
1M
bigrams,
1M
trigrams.
6. Different
Methods
and
Datasets
GLOVE
(Pennington,
Socher,
Manning):
Generates
terms
that
are
“similar”
to
the
seed
terms.
The
similarity
metric
is
based
on
an
analysis
of
a
large
family
of
web
documents
from
the
last
7
years,
that
were
gathered
by
Common
Crawl.
3M
unigrams,
10K
bigrams.
7. Different
Methods
and
Datasets
Generates
term
pairs
that
occur
with
one
or
more
of
the
seed
terms
with
a
frequency
that
is
relaFvely
high
as
compared
with
how
ocen
they
occur
in
“all”
documents.
8. Finding
Relevant
EnFFes:
Using
Wikipedia
Phase1:
DisambiguaFon
Phase2:
Find
Synonyms
C:
Youth
E:
World
EducaFon
Services
C:
EducaFon
issues
C:
History
of
EducaFon
C:
EducaFon
reform
E:
Shlomo
Dovrat
Wikipedia
Categories
Wikipedia
EnFFes
E:
EducaFon
Reform
E:
Common
Core
State
Init.
DisambiguaFon:
Atlas
V
is
called
Common
Core
Booster
10. Using
Wikipedia
American
Educator,
proponent
of
homeschooling
C:
Youth
E:
World
EducaFon
Services
C:
EducaFon
issues
C:
History
of
EducaFon
C:
EducaFon
reform
E:
Shlomo
Dovrat
E:
School-‐to-‐
work
transiFon
E:
EducaFon
reform