A Multifaceted Look at Faceting for Relevant Search

A Multifaceted Look At Faceting - Using Facets
“Under the Hood" to Facilitate Relevant Search
Ted Sullivan
Senior Solutions Architect, Lucidworks Professional Services

Agenda
• Facet history - why solr faceting rocks
• Text - unstructured? I hardly think so!
• Using facets as context mining engines

3
01
Facets are Metadata
Facet: “A particular aspect or feature of something.”
Metadata: ”Data about data" - attributes, aspects, descriptors, features, properties, traits
Facets == Metadata
Metadata Semantics: what, where, when, why
name, size, shape, color, material, texture
manufacturer, number of outlets, voltage, is pre-assembled …
address, phone number, birth date, user rating
hand size, likes country music, believes in climate change …
Metadata Dependencies: Some metadata fields depend on “what" the “thing” is
e.g. People have different attributes than Toaster Ovens

4
01
A
Little
History:
Facet
Technology
Terminology
Verity K2 Parametric Search
Fast ESP Navigators
MS Fast Refiners
Endeca Dimensions
Solr Facets
Google (& Autonomy too) Facets? Facets? We don’t need no stinkin’ facets!

5
Traditional
Uses
of
Facets
Faceted Navigation - aka “Refinement” / “Drill down”
Allows initial query to be ambiguous without requiring the user to “rethink”
what to search for.
Neatly handles the old “No Results Found” trial-and-error bug-a-boo.
Down sides:
Facets should not be used as a ‘band-aid’ for poorly tuned relevance!!!
•The “need” for faceted navigation forces us to favor recall over precision. (Maybe this is
why Google avoids them!)
•… because you have to drill-in to something.
If users have to use facets to drill in to what they really want -
Why search in the first place - why not just browse?
Facet 'noise' - false positives due to poor precision / high recall causes weird outliers
(ML techniques like Signal Aggregation to improve relevance do not help here)

6
Visualization
Facets show a high-level or global 'context' of what the result set is “about"
Dashboards - Search Driven BI:
Eye Candy: Pie charts, bar charts, histograms, etc. - make use of the basic statistical nature
of facets (i.e. counts - but now lots of things mean, median, std, skewness, etc).
Data Analytics:
Solr now enhances facet statistics to include many more useful mathematical calculations.
Use basic analytics such as mean, standard deviation and the like to do more complex
analytics similar to what is done with Databases in OLAP “cubes”.
Time-series: Range Facets on a Date-Time (Trie)Field
Advantage - analytics are search driven so that the “cube” can change with the query
Facet
Analytics
-‐
Maturing
Rapidly

7
Solr
Facets
are
Dynamic
not
Static
In other search engines like Verity, Fast or Endeca:
Facet values are computed at index time - thereby making them Static at query time.
Lucene did not have faceting originally
-Solr added faceting “on the fly” - i.e. at query time (before Lucene added it)
-Solr faceting is thus Dynamic!
The main hurdle to doing it this way is to make it FAST (mission accomplished - thanks Yonik!)
Although this would seem to be what we engineers call a "bolt on"
- in hindsight this was a very fortuitous evolutionary path!!
Once you do this, there are serious advantages over index-time faceting!!
One main advantage is flexibility
-In Solr - you can facet on just about anything - even things that weren’t thought about when
the collection was designed (function queries - extensible with ValueSource impls!)
-Good, good, good!!!

8
Very
Brief
Survey
of
Solr
Faceting
Methods
Metadata
fields
-‐
prefer
non-‐tokenized
field
types

(you
can
facet
on
tokenized
fields
too
-‐
but
why
would
you
want
to?)

enum
method
=>
filter
cached
filter
queries

fc
method

=>
uses
the
FieldCache
(
it
now
uses
DocValues)

facet
query

facet
prefix

facet
range

function
queries
and
Value
Sources

pivot
facets

excluding
and
tagging

JSON
Facets

Facet
performance
tuning
(gotchas)

-‐
I
could
talk
about
this
at
length
but
…
nah!
Read
the
Wikis!

9
Language
Semantics
Nouns,
Verbs,
Adjectives,
Adverbs,
Prepositions,
etc.

What
type
of
thing
is
Is?

Car

What
it’s
name?

Lamborghini
Aventador

Where
is
it
Made?

Italy

How
much
horsepower
does
it
have?

A
hellova
lot

How
fast
does
it
go?

Very

How
much
does
it
cost?

If
you
have
to
ask
…

My
Search
Philosophy:

Humans
use
language
to
search
because
that
is
how
we
reason
about
things.

Search
engines
need
to
do
a
better
job
of
understanding
language
to
better
help
us
to
find
the

things
we
are
looking
for.

Search
index
schemas
describe
a
machine
oriented
/
data-‐centric
view
of
things

-‐
want
to
translate
that
to
and
from
language-‐centric
views

-‐
from
a
search
engine
perspective
-‐
descriptive
text
is
“unstructured”
data
-‐
but
not
to
us!

10
Metadata
and
Text
Transforms
Metadata:
Data-‐centric
view
of
things

<=>

Text:
Language-‐centric
view
of
things

Metadata
terms
are
embedded
in
language

Compose
descriptive
text
about
a
thing
from
its
attributes
or
properties

-‐>
Create
linguistic
expressions
from
metadata

Deduce
attributes
or
properties
of
a
thing
from
descriptive
text

-‐>
Compute
metadata
by
linguistic
analyses
of
text

Search
problem
-‐
match
terms
in
query
with
things
in
index

-‐>
Knowledge
of
word
meanings
is
power!!!

-‐>
Facet
metadata
constitutes
knowledge
that
can
be
leveraged!!

FACETS ARE CONTEXT DISCOVERY TOOLS
Lemma 1: Similar things occur in similar contexts
Lemma 2: Facets are context exploration tools
Assertion: Facets can be used to ﬁnd similar things

12
Exploiting
Facet
Metadata
Facets
provide
a
sort
of
global
metadata
CONTEXT
for
a
search
result
set

In
addition
to
faceting,
how
can
we
exploit
metadata
to
enhance
search?

❖
Turning
facet
metadata
inside-‐out:

Query
Autofiltering

❖
Using
Facets
to
build
contextual
typeahead
suggester:

•Pivot
facets
to
construct
phrases
from
structured
data.

•Extract
related
information
using
facets
at
index
time

to
enable
security
trimming
and
dynamic
boosting

❖
Using
Facets
for
text
analytics
to
generate
better
facets

•Facet
ratios
of
positive
and
negative
queries
on
key
terms

-‐>
detects
“key
term
clusters”

•Document
clustering
using
key
term
cluster
vectors

-‐>
detects
key
term
categories

13
Example:
Detecting
User
Intent
in
eCommerce
Separating
the
‘What’
from
the
‘What
about’

(
i.e.

a
Thing
vs.
a
Really
BIG
Thing)

microwave
safe
dishes

‘microwave
safe’
-‐
adjective
phrase

compact
microwave
oven

‘microwave
oven’
-‐
noun
phrase

microwave

‘microwave’
-‐
noun
-‐
contraction
for
‘microwave
oven’

coffee
filter

‘coffee
filter’
-‐
noun-‐noun
phrase
-‐
a
filter

coffee

‘coffee’
-‐
noun
-‐
a
beverage

coff

coffee
table

‘coffee
table’
-‐
noun-‐noun
phrase
-‐
table

coffee
colored
sheets

‘coffee
colored’
-‐
adjective
phrase

coffee
ice
cream

coffee
flavored
ice
cream

milk
chocolate

a
type
of
chocolate

chocolate
milk

flavored
milk

14
Query
Autofiltering
Uses
field
values
to
generate
a
“reverse
lookup
map”
that
maps
values
to
fields

that
contain
them.

-‐
Inverts
the
“uninverted
map"
-‐
ah
…
another
type
of
inverted
map
-‐
values
-‐>
fields

-‐
Uses
the
Lucene
SynonymMap
Finite
State
Machine
(FST)
implementation

Uses
this
map
to
parse
the
query
to
find
terms
in
the
query
related
to
specific
metadata
fields.

Example:

‘red
sofa’

maps

red
=>
color

sofa
=>
product_type

Selects
the
longest
contiguous
phrase
in
its
lexicon
to
match
against
parts
of
the
query

If
have
‘coffee’
and
‘coffee
filter’
in
the
lexicon
(i.e.
the
Solr
collection)

the
query
‘coffee
filter’
will
only
match
‘coffee
filter’

Can
construct
either
a
Solr
filter
query
(fq)
or
boost
query
(bq)
using
this
information.

15
Query
Autofiltering
-‐
Knowledge
Mining
Doing
this
is
a
way
of
exploiting
the
field/value
relationships
in
the
collection
metadata.

So
what
it
effectively
does
is
extract
the
knowledge
that
is
built-‐in
to
your
collection
due
to
the
facet

metadata
that
it
contains
and
applies
that
knowledge
to
parsing
of
the
query:

•It
knows
that
‘red’
is
a
color
because
‘red’
is
a
value
in
the
’color’
field.

•It
‘Short
circuits’
the
search-‐then-‐drill-‐in
paradigm
-‐>
just
search!

•But
as
the
telemarketers
say:

“Wait!
there’s
more!
…”

The
knowledge
about
what
terms
mean
and
the
properties
of
the
term
field
(single
valued
vs.
multi-‐valued)

provide
other
opportunities
that
can
be
exploited!

16
Query
Autofiltering
-‐
Language
Logic
Can
provide
a
semblance
of
“natural
language
processing”
by
breaking
a
query
into
semantic

parts
and
applying
those
appropriately

Natural
Language
Boolean
vs
Mathematical
Boolean

Language
usage
of
boolean
terms
like
‘AND’
and
‘OR’
is
contextual!!

“show
me
green
or
blue
shirts”

is
equivalent
to

“show
me
green
and
blue
shirts”

The
user
means
‘both’
in
each
case
so
‘and’
and
‘or’
are
synonyms
in
this
usage
context!

but
in

“show
me
fast
and
inexpensive
cars”

-‐
‘and’
means
AND!

Depends
on
field
cardinality!

If
color
is
single-‐valued
and
‘attributes’
is
multi-‐valued.

Users
understand
this
intuitively
-‐
Search
Engines
don’t
but
Query
Autofilter
can
get
this
right!

17
Query
Autofiltering
-‐
Extensions
-‐
Query
Patterns

Once
you
know
what
individual
query
terms
and
phrases
mean,
you
can
exploit
this
by
creating

templates
for
popular
query
patterns

Query
Pattern:

Terms
+
Facet
fields
that
will
be
captured
by
Query
AutoFilter

Query
Template:

Query
template
with
placeholders
for
field
values
filled
in
if
user

query
matches
the
pattern.

Example:
Music
Ontology

User
Query:

Who’s
in
The
Who

Query
Pattern:

(who's
in,was
in,were
in,member
of,members
of)|${hasPerformer_ss}

Query
Template:

memberOfGroup_ss:${hasPerformer_ss}

User
Query:

Songs
Beatles
Covered

Query
Pattern:

(song,songs)|${hasPerformer_ss}|covered

Query
Template:

hasPerformer_ss:${hasPerformer_ss}
AND
version_s:Cover

18
Canned
Demo
-‐
Who’s
In
The
Who

19
Typeahead
-‐
Priming
the
Pump
with
Pivot
Facet
Patterns

Construct
semantically
meaningful
phrases
from
multiple
metadata
fields

✦Inverse
of
Query
AutoFiltering

-‐
creates
suggestions
that
we
know
how
to
process!!

✦Uses
Solr
Pivot
Facets
to
translate
field
patterns
to
suggested
query
phrases

Examples:

${hasPerformer_ss}
${Recording_Type_s}s

=>
Beatles
Songs,
Led
Zeppelin
Songs,
Billy
Joel
Songs,
Frank
Zappa
Songs
etc.

${genres_ss}
${Musician_Type_ss}s

=>

Classical
Pianists,
Hard
Rock
Guitarists,
Jazz
Drummers

${Recording_Type_s}s
${hasPerformer_ss}

Covered

(with
fq
version_s:Cover)

=>
Songs
Jimi
Hendrix
Covered

20
Building
a
Suggester
with
Dynamic
Context

Assertion:

Facets
can
be
used
to
find
similar
things.

Example:

John
Lennon
and
Paul
McCartney
share
many
attributes,
activities,
group

memberships,
in
common

-‐>
They
are
closely
related
entities.

Search
Agendas:

Users
tend
to
have
some
high
level
goal
when
searching
(e.g.
Find
out
information
about
The

Beatles)

Agenda’s
can
change
in
a
session,
but
it
is
likely
that
queries
issued
within
a
short
period
of
time

will
have
a
similar
goal.

Conclusion:

Facet
meta-‐information
from
facets
can
be
used
to
associate
similar
things
or
concepts
within
a

search
session.

21
Building
a
Suggester
with
Dynamic
Context

Suggester
Builder
Design
(Fusion
Connector)
Uses
Facet
Queries
against
a
Content

Collection
to
create
additional

metadata
for
the
Suggester
or

Typeahead
Collection.

This
contextual
metadata
can
then

be
used
for:

•
Security
Trimming
of
Typeahead

suggestions

•
Dynamic
boosting
of
similar

suggestions
within
a
user
session

22
Building
a
Suggester
with
Dynamic
Context

Bring
back
other
fields
in
addition
to
displayed
suggestion
text

(i.e.,
the
ones
that
were
calculated
using
faceting)

If
a
query
is
used
to
search,
temporarily
store
its
associated
metadata
in
a
circular
cache
on
the
browser.

When
submitting
the
next
typeahead
query,
add
the
cached
information
from
the
queue
as
boost

queries.
Type
‘j’
-‐
get
back

Jai
Johnny
Johanson
Bands

Jai
Johnny
Johanson
Groups

J.J.
Johnson

Jai
Johnny
Johanson

Juke
Joint
Jezebel

Juke
Joint
Jimmy
Just
searched
for
‘Paul
McCartney’
then
type
‘j’

John
Lennon

John
Lennon
Songs

John
Lennon
Songs
Covered

James
P
Johnson
Songs
(?)

John
Lennon
Originals

Hey
Jude

23
Building
a
Suggester
with
Dynamic
Context

Paul
McCartney’s
“Meta-‐informational
Context”:
genres_ss:

Rock,
Rock
&
Roll,
Soft
Rock,
Pop
Rock

hasPerformer_ss:

Beatles,
Paul
McCartney,
José
Feliciano,
Jimi
Hendrix,
Joe
Cocker,

Aretha
Franklin,
Bon
Jovi,
Elvis
Presley
(
…
and
many
more)

composer_ss:

Paul
McCartney,
John
Lennon,
Ringo
Starr,
George
Harrison,

George
Jackson,
Michael
Jackson,
Sonny
Bono

memberOfGroup_ss:

Beatles,
Wings
Dynamic
Boost
Query:
genres_ss:”Rock”^50
genres_ss:”Rock
&
Roll”^50
genres_ss:”Soft
Rock”^50
genres_ss:”Pop
Rock”^50

hasPerformer_ss:”Beatles”^50
hasPerformer_ss:”Paul
McCartney”^50
hasPerformer_ss:”José

Feliciano”^50
hasPerformer_ss:”Jimi
Hendrix”^50

composer_ss:”Paul
McCartney”^50
composer_ss:”John
Lennon”^50
composer_ss:”Ringo
Starr”^50

composer_ss:”George
Harrison”^50

memberOfGroup_ss:”Beatles”^50
memberOfGroup_ss:”Wings”^50

24
Text
Mining
Analyses
Problem:

Metadata
needs
to
be
improved
for
useful
application
of
QAF
(i.e.
Real
World)

Case
1:

Extracting
product
type
and
product
attributes
metadata
from
short
product
descriptions
in
eCommerce

data
-‐
dealing
with
precision
and
recall

Case
2:

Large
text
documents.

Want
to
extract
keywords
and
assign
categories
to
documents.

Interesting
properties
of
facets
when
directed
towards
unstructured
text:

Facet
ratios
of
positive
and
negative
queries
yield
“keyword
clusters”

Document
clustering
of
keyword
cluster
vectors
give
crisp
categories

25
Auto
phrasing
vs.
Auto
filtering
Auto
Phrasing

-‐Multi-‐term
phrases
that
refer
to
a
single
entity.

-‐Used
as
a
workaround
to
Solr
“Multi-‐term
synonym
problem”

-‐That
is
now
fixed
(as
of
6.4.1
-‐
thanks
Steve
Rowe!)

-‐Is
Auto
phrasing
solution
now
obsolete?

-‐Answer:
NOT!!!,
that
was
exploiting
a
side
effect
of
what
it
does!

-‐
Uses
knowledge
from
a
phrase
list
to
determine
what
is
an
auto
phrase

-‐Works
on
tokenized
text
fields
(implemented
as
a
Lucene
TokenFilter)

Query
Auto
Filtering

-‐
Utilizes
information
from
non-‐tokenized
text
fields
-‐
inherently
solves
multi-‐term
problem

Strategy
for
“unstructured
text”:

Use
auto
phrasing
to
extract
phrase
metadata
(
keywords
)
from
unstructured
text

This
metadata
can
then
be
consumed
by
Query
Autofilter
at
search
time.

26
Simple
Keyword
Analysis
“Unstructured” Text Lucene Analyzer with Auto Phrasing Extensions
Spark Job
Metadata
we
would
like
to
have
but
don’t
have
-‐
requires
lots
of
manual
curation
==
$$$

Have
short
descriptive
text
fields
that
can
be
mined
to
glean
useful
metadata
such
as
product
type,
material,

size.
Special
Sauce
Ingredients:

➡Semantically
pure
lexicons
(things,
brands,
attributes,
dimensions,
logos,
materials)
of
key
terms

➡Auto
phrasing-‐based
Lucene
Analysis
to
extract
key
terms
and
“stop
phrases”
(e.g.
Mr
Coffee)

➡Expansions
and
Relations
based
on
noun
phrases
in
lexicon.

Contextually
aware
management
of

precision
and
recall.

➡Tricks
to
deal
with
“leather
case
for
iPhone”,
“DSLR
camera
with
50-‐mm
lens”

27
Expansions
and
Relations
Motivation:
eCommerce
Use
Case:

Search
for
‘iPhone’
-‐
get
iPhone
cases
and
iPhone
chargers
mixed
in.

-‐

Want
to
have
both
BUT
want
iPhones
at
the
TOP
of
the
result
set.

=>
TF/IDF
doesn’t
always
deliver
on
this
(can’t
control
relevance
-‐
you
get
what
you
get)

i.e.
-‐
want
recall
for
up
sell
opportunities
-‐
so
relax
precision
a
bit.

Relevance
(what
I
want
is
on
top)
is
still
very
important

Search
for
‘iPhone
case’

Now
I
want
precision
-‐
just
show
me
iPhone
cases
please
‘cause
I
already
got
a
stinkin’
iPhone!!

Why
else
would
I
be
looking
for
accessories
for
it
???

28
Expansions
and
Relations
Noun
phrases
have
structure:

end
table

side
table

dining
room
table

picnic
table

coffee
table

folding
table

=>
Are
ALL
types
of
tables

table
cloth

table
setting

table
lamp

table
chair

=>
Are
table
related
things.

Expansions
-‐
IS-‐A
relationships

Phrases
that
end
in
‘table’
are
specific
types
of
tables

classify
‘end
table’
as
‘table’
too

=>
search
for
‘table’
returns
all
types
of
tables

=>
search
for
‘end
table’
just
returns
end
tables

Relations
-‐
IS-‐LIKE
Relationships

Phrases
that
start
with
‘table’
are
table
related
things

Add
table
related
things
to
fq
for
‘table’
as
OR
list

Boost
search
term
‘table’
more
than

table
related
things
-‐
get
both
but
tables
are
first

Table
related
things
don’t
have
relations

-‐
search
is
more
specific
-‐
just
get
that
thing!

29
Unstructured
Text
-‐
Oh
My!
The
problem
of
unstructured
text
is
that
it
is
…
well
unstructured
….
or
is
it?
(Linguists
don’t
think
so!)

We
search
but
don’t
typically
facet
on
unstructured
text
fields
(i.e.
tokenized
fields).

Even
though
in
Solr
we
can
facet
on
anything

-‐
Get
all
of
the
tokenized
terms
and
their
counts
as
facet
values
-‐>
very
high
cardinality

-‐
Absolutely
useless
for
UI
drill
in
-‐
so
this
is
basically
a
no-‐no
at
query
time

=>
But
that
is
not
all
that
facets
are
good
for
so
…
wait
a
minute
(light-‐bulb
moment)!
<=

What
if
we
DID
facet
on
the
tokens
and
used
their
stats
to
do
some
text
analysis?

=>
It
turns
out
we
can
use
facets
to
detect
keywords
in
documents.

<=

Keywords
-‐
Terms
that
occur
in
relatively
few
documents
(but
not
too
few).

-‐
Tend
to
be
important
words
in
some
subjects
but
not
others

-‐
i.e.
their
usage
is
highly
contextual
to
a
subject!

Keywords
for
the
same
subject
area
tend
to
occur
together
because
they
share
the
same
context!

Facets
are
a
great
context
mining
tool!!

Sounds
like
a
FIT!

30
Facet
Ratios
=>
Keyword
Clustering
Method
to
my
Madness:

•

Tokenize
text
with
auto
phrasing,
stop
words
and
synonyms

-‐
store
tokens
in
a
multi-‐valued
field
with
DocValues

-‐
(yes
you
can
facet
on
a
text
field
but
it
tends
to
hit
a
wall
-‐
2M
word
limit
on
facet
values)

•

Using
the
/terms
handler,
get
each
term
in
the
text
field.

•

Submit
two
queries

-‐
one
with
text_field:[term]

(positive
Q)

-‐
one
with
-‐text_field:[term]
(negative
Q)

•

Calculate
the
following
ratio:

•

Take
the
xlog(x)
of
this
ratio
(for
better
discrimination)

-‐for
each
term,
take
the
best
related
terms
above
some
threshold

Facet
counts
(posizve
Q)

————————————

Total
counts
(posizve
Q)

———————————————

Facet
counts
(negazve
Q)

————————————-‐

Total
counts
(negazve
Q)

31
Facet
Ratios
=>
Keyword
Clusters
Authentication
1002.7227722772277
firewall

561.5247524752475
authorization

401.08910891089107
passwords

374.34983498349834
plugging

160.43564356435644
transport

88.81258840169731
tied

80.21782178217822
weblogic,bootstrap,computationally,

finely,usernames

56.152475247524755
ssl

40.10891089108911
login,augments,dialog,encapsulated,

fallback,privileged

34.87731381833836
kerberos

32.087128712871284
permission

28.64922206506365
password

26.739273927392738
acls,realm,conversely

25.523852385238524
streaming

23.874351720886374
enabling

23.174037403740375
remote
22.91937765205092
protocols

20.054455445544555
globally,bind,indirectly,redirects

18.51180502665651
ldap

16.043564356435642
proxy

14.585058505850585
memberships

14.465508845966564
permissions

13.369636963696369
protect

12.534034653465346
grained

11.918076379066479
linux

11.523002023959302
advice

11.45968882602546
authenticate

11.23049504950495
hash

11.184215536938309
message

10.106182271770484
plugins

10.027227722772277
header,controlled,crafting

9.43739079790332
username

8.913091309130913
logins

32
Facet
Ratios
=>
Keyword
Clusters
Phase
II
-‐
check
related
terms
for
correlation
(count
agreements
in
their
related
terms)

8984.39603960396
authorization

4411.980198019802
passwords

4010.891089108911
firewall

802.1782178217823
usernames

601.6336633663367
password

508.046204620462
realm

505.3722772277228
ssl

481.3069306930693
login

425.7715156130997
ldap

418.52776582006027
kerberos

320.8712871287129
bootstrap,finely

256.69702970297027
permission

216.98263268949847
permissions

213.9141914191419
acls

185.392299229923
remote
167.1204620462046
enabling

153.14311431143113
streaming

132.12347117064647
username

120.32673267326733
bind

106.95709570957095
conversely

53.478547854785475
logins

46.09200809583721
advice

39.269812336882225
requests

32.52073856034252
zookeeper

31.13477101134771
plugin

28.422165293452423
admin

26.739273927392738
bother

25.465975168945466
controls

24.68240670220868
native

22.716551301147813
require

33
Keyword
Vector
Document
Clustering
Use
the
Keyword
Vectors
to
compute
distances
between
documents
rather
than
raw
TF/IDF

=>
Higher
Signal
To
Noise

Tokenizer Compute Keyword Vector K-Means Clustering
Cluster: 98
stump_the_chump: 15159.853372701356
stump: 12931.059994928455
prize: 12378.463050783357
sight: 2943.012345679012
tough: 2872.8905092427412
question: 2827.6045026881716
judge: 2353.9344100731737
submit: 2250.350305525309
session: 2147.8922671532514
panel: 1888.958487954128
hostetter: 1722.9000585471174
grant: 1600.741568627451
chump: 1558.9513516128222
lucene_revolution: 1353.774672198919
spot: 1211.5869933577087
award: 1048.082490095137
mock: 1005.0931680939833
conference: 903.0025141117053
muir: 878.7673037468955
seat: 870.9154155915239
hot: 799.5070748299321
Get a list of documents for each cluster - label the clusters ==> Document Category

34
Keyword
Vector
Document
Clustering
Cluster: 85
young_generation: 58393.71450722004
throughput_collector: 51879.272543859726
tenured_space: 45769.06697989158
young_generation_collector: 36786.321736596736
tenure: 33389.738840692735
stop_the_world: 31288.96145142277
concurrent_low_pause_collector: 29612.759802867382
useadaptivesizepolicy: 26927.686004351213
useparnewgc: 26819.583333333332
useparalleloldgc: 25450.34188034188
jvm: 25354.346667094775
young_space: 22546.65166222556
useparallelgc: 22168.126984126982
collector: 21226.31425547997
survivor_space: 20836.967617437283
heap: 18883.16487771459
garbage_collection: 18247.692641501046
garbage_collector: 17929.789619546355
command_line_options: 10111.764705882353
sweep: 9803.141574757969

A Multifaceted Look at Faceting for Relevant Search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Multifaceted Look at Faceting for Relevant Search

Similar to A Multifaceted Look at Faceting for Relevant Search (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

A Multifaceted Look at Faceting for Relevant Search