SlideShare a Scribd company logo
1 of 35
Download to read offline
A Multifaceted Look At Faceting - Using Facets
“Under the Hood" to Facilitate Relevant Search
Ted Sullivan
Senior Solutions Architect, Lucidworks Professional Services
Agenda
• Facet history - why solr faceting rocks
• Text - unstructured? I hardly think so!
• Using facets as context mining engines
3
01
Facets are Metadata
Facet: “A particular aspect or feature of something.”
Metadata: ”Data about data" - attributes, aspects, descriptors, features, properties, traits
Facets == Metadata
Metadata Semantics: what, where, when, why
name, size, shape, color, material, texture
manufacturer, number of outlets, voltage, is pre-assembled …
address, phone number, birth date, user rating
hand size, likes country music, believes in climate change …
Metadata Dependencies: Some metadata fields depend on “what" the “thing” is
e.g. People have different attributes than Toaster Ovens
4
01
A	
  Little	
  History:	
  Facet	
  Technology	
  Terminology
Verity K2 Parametric Search
Fast ESP Navigators
MS Fast Refiners
Endeca Dimensions
Solr Facets
Google (& Autonomy too) Facets? Facets? We don’t need no stinkin’ facets!
5
Traditional	
  Uses	
  of	
  Facets
Faceted Navigation - aka “Refinement” / “Drill down”
Allows initial query to be ambiguous without requiring the user to “rethink”
what to search for.
Neatly handles the old “No Results Found” trial-and-error bug-a-boo.
Down sides:
Facets should not be used as a ‘band-aid’ for poorly tuned relevance!!!
•The “need” for faceted navigation forces us to favor recall over precision. (Maybe this is
why Google avoids them!)
•… because you have to drill-in to something.
If users have to use facets to drill in to what they really want -
Why search in the first place - why not just browse?
Facet 'noise' - false positives due to poor precision / high recall causes weird outliers
(ML techniques like Signal Aggregation to improve relevance do not help here)
6
Visualization
Facets show a high-level or global 'context' of what the result set is “about"
Dashboards - Search Driven BI:
Eye Candy: Pie charts, bar charts, histograms, etc. - make use of the basic statistical nature
of facets (i.e. counts - but now lots of things mean, median, std, skewness, etc).
Data Analytics:
Solr now enhances facet statistics to include many more useful mathematical calculations.
Use basic analytics such as mean, standard deviation and the like to do more complex
analytics similar to what is done with Databases in OLAP “cubes”.
Time-series: Range Facets on a Date-Time (Trie)Field
Advantage - analytics are search driven so that the “cube” can change with the query
Facet	
  Analytics	
  -­‐	
  Maturing	
  Rapidly
7
Solr	
  Facets	
  are	
  Dynamic	
  not	
  Static
In other search engines like Verity, Fast or Endeca:
Facet values are computed at index time - thereby making them Static at query time.
Lucene did not have faceting originally
-Solr added faceting “on the fly” - i.e. at query time (before Lucene added it)
-Solr faceting is thus Dynamic!
The main hurdle to doing it this way is to make it FAST (mission accomplished - thanks Yonik!)
Although this would seem to be what we engineers call a "bolt on"
- in hindsight this was a very fortuitous evolutionary path!!
Once you do this, there are serious advantages over index-time faceting!!
One main advantage is flexibility
-In Solr - you can facet on just about anything - even things that weren’t thought about when
the collection was designed (function queries - extensible with ValueSource impls!)
-Good, good, good!!!
8
Very	
  Brief	
  Survey	
  of	
  Solr	
  Faceting	
  Methods
Metadata	
  fields	
  -­‐	
  prefer	
  non-­‐tokenized	
  field	
  types	
  	
  
(you	
  can	
  facet	
  on	
  tokenized	
  fields	
  too	
  -­‐	
  but	
  why	
  would	
  you	
  want	
  to?)	
  
	
   	
  
enum	
  method	
  =>	
  filter	
  cached	
  filter	
  queries	
  
fc	
  method	
  	
  	
  	
  	
  	
  	
  	
  =>	
  uses	
  the	
  FieldCache	
  (	
  it	
  now	
  uses	
  DocValues)	
   	
  
	
  	
  	
  	
  	
  	
  
facet	
  query	
  
facet	
  prefix	
  
facet	
  range	
  
function	
  queries	
  and	
  Value	
  Sources	
  
pivot	
  facets	
  
excluding	
  and	
  tagging	
  
JSON	
  Facets	
  
Facet	
  performance	
  tuning	
  (gotchas)	
  	
  
-­‐	
  I	
  could	
  talk	
  about	
  this	
  at	
  length	
  but	
  …	
  nah!	
  Read	
  the	
  Wikis!
9
Language	
  Semantics
Nouns,	
  Verbs,	
  Adjectives,	
  Adverbs,	
  Prepositions,	
  etc.	
  
	
  	
  	
  
What	
  type	
  of	
  thing	
  is	
  Is?	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Car	
  
What	
  it’s	
  name?	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Lamborghini	
  Aventador	
  
Where	
  is	
  it	
  Made?	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Italy	
  
How	
  much	
  horsepower	
  does	
  it	
  have?	
  	
  	
  	
  A	
  hellova	
  lot	
  
How	
  fast	
  does	
  it	
  go?	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Very	
  
How	
  much	
  does	
  it	
  cost?	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  If	
  you	
  have	
  to	
  ask	
  …	
  
	
  	
  	
  	
  
My	
  Search	
  Philosophy:	
  	
  	
  
	
  	
  	
  	
  	
  	
  
Humans	
  use	
  language	
  to	
  search	
  because	
  that	
  is	
  how	
  we	
  reason	
  about	
  things.	
  	
  
	
  	
  	
  	
  	
  
Search	
  engines	
  need	
  to	
  do	
  a	
  better	
  job	
  of	
  understanding	
  language	
  to	
  better	
  help	
  us	
  to	
  find	
  the	
  
things	
  we	
  are	
  looking	
  for.	
  
	
  	
  	
  
Search	
  index	
  schemas	
  describe	
  a	
  machine	
  oriented	
  /	
  data-­‐centric	
  view	
  of	
  things	
  	
  
-­‐	
  want	
  to	
  translate	
  that	
  to	
  and	
  from	
  language-­‐centric	
  views	
  
-­‐	
  from	
  a	
  search	
  engine	
  perspective	
  -­‐	
  descriptive	
  text	
  is	
  “unstructured”	
  data	
  -­‐	
  but	
  not	
  to	
  us!
10
Metadata	
  and	
  Text	
  Transforms
Metadata:	
  Data-­‐centric	
  view	
  of	
  things	
  	
  	
  	
  	
  <=>	
  	
  	
  	
  	
  	
  Text:	
  Language-­‐centric	
  view	
  of	
  things	
  
	
  	
  	
  	
  	
  
Metadata	
  terms	
  are	
  embedded	
  in	
  language	
  
	
  	
  	
  	
  
Compose	
  descriptive	
  text	
  about	
  a	
  thing	
  from	
  its	
  attributes	
  or	
  properties	
  	
  
	
  	
  	
  
-­‐>	
  Create	
  linguistic	
  expressions	
  from	
  metadata	
  
	
  	
  	
  	
  	
  	
  
Deduce	
  attributes	
  or	
  properties	
  of	
  a	
  thing	
  from	
  descriptive	
  text	
  
	
  	
  	
  	
  	
  	
  
-­‐>	
  Compute	
  metadata	
  by	
  linguistic	
  analyses	
  of	
  text	
  
Search	
  problem	
  -­‐	
  match	
  terms	
  in	
  query	
  with	
  things	
  in	
  index	
  
	
  	
  	
  	
  	
  
-­‐>	
  Knowledge	
  of	
  word	
  meanings	
  is	
  power!!!	
  
	
  	
  	
  	
  
-­‐>	
  Facet	
  metadata	
  constitutes	
  knowledge	
  that	
  can	
  be	
  leveraged!!
FACETS ARE CONTEXT DISCOVERY TOOLS
Lemma 1: Similar things occur in similar contexts
Lemma 2: Facets are context exploration tools
Assertion: Facets can be used to find similar things
12
Exploiting	
  Facet	
  Metadata
Facets	
  provide	
  a	
  sort	
  of	
  global	
  metadata	
  CONTEXT	
  for	
  a	
  search	
  result	
  set	
  
	
  	
  	
  	
  	
  
In	
  addition	
  to	
  faceting,	
  how	
  can	
  we	
  exploit	
  metadata	
  to	
  enhance	
  search?	
  
	
  	
  	
  	
  	
  
❖	
  Turning	
  facet	
  metadata	
  inside-­‐out:	
  	
  Query	
  Autofiltering	
  
	
  	
  	
  	
  	
  
❖	
  Using	
  Facets	
  to	
  build	
  contextual	
  typeahead	
  suggester:	
  
	
  	
  	
  	
  	
  	
  	
  
•Pivot	
  facets	
  to	
  construct	
  phrases	
  from	
  structured	
  data.	
  
	
  	
  	
  	
  	
  
•Extract	
  related	
  information	
  using	
  facets	
  at	
  index	
  time	
  
to	
  enable	
  security	
  trimming	
  and	
  dynamic	
  boosting	
  
	
  	
  	
  	
  	
  
❖	
  Using	
  Facets	
  for	
  text	
  analytics	
  to	
  generate	
  better	
  facets	
  
	
  	
  	
  	
  	
  
•Facet	
  ratios	
  of	
  positive	
  and	
  negative	
  queries	
  on	
  key	
  terms	
  	
  
-­‐>	
  detects	
  “key	
  term	
  clusters”	
  
	
  	
  	
  	
  	
  	
  	
  
•Document	
  clustering	
  using	
  key	
  term	
  cluster	
  vectors	
  
-­‐>	
  detects	
  key	
  term	
  categories
13
Example:	
  Detecting	
  User	
  Intent	
  in	
  eCommerce
Separating	
  the	
  ‘What’	
  from	
  the	
  ‘What	
  about’	
  	
  
	
  	
  	
  	
  	
  
(	
  i.e.	
  	
  a	
  Thing	
  vs.	
  a	
  Really	
  BIG	
  Thing)	
  
	
  	
  	
  
microwave	
  safe	
  dishes	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘microwave	
  safe’	
  -­‐	
  adjective	
  phrase	
  
	
  	
  	
  
compact	
  microwave	
  oven	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘microwave	
  oven’	
  -­‐	
  noun	
  phrase	
  
	
  	
  	
  
microwave	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘microwave’	
  -­‐	
  noun	
  -­‐	
  contraction	
  for	
  ‘microwave	
  oven’	
  
	
  	
  	
  	
  	
  
coffee	
  filter	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘coffee	
  filter’	
  -­‐	
  noun-­‐noun	
  phrase	
  -­‐	
  a	
  filter	
  
	
  	
  	
  
coffee	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘coffee’	
  -­‐	
  noun	
  -­‐	
  a	
  beverage	
  
coff	
  
coffee	
  table	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘coffee	
  table’	
  -­‐	
  noun-­‐noun	
  phrase	
  -­‐	
  table	
  
	
  	
  	
  	
  
coffee	
  colored	
  sheets	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ‘coffee	
  colored’	
  -­‐	
  adjective	
  phrase	
  
	
  	
  	
  
coffee	
  ice	
  cream	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  coffee	
  flavored	
  ice	
  cream	
  
	
  	
  	
  	
  	
  	
  	
  	
  
milk	
  chocolate	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  type	
  of	
  chocolate	
  
	
  	
  	
  	
  	
  
chocolate	
  milk	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  flavored	
  milk
14
Query	
  Autofiltering
Uses	
  field	
  values	
  to	
  generate	
  a	
  “reverse	
  lookup	
  map”	
  that	
  maps	
  values	
  to	
  fields	
  	
  
that	
  contain	
  them.	
  
	
  	
  	
  
	
  	
  -­‐	
  Inverts	
  the	
  “uninverted	
  map"	
  -­‐	
  ah	
  …	
  another	
  type	
  of	
  inverted	
  map	
  -­‐	
  values	
  -­‐>	
  fields	
  
	
  	
  	
  
	
  	
  -­‐	
  Uses	
  the	
  Lucene	
  SynonymMap	
  Finite	
  State	
  Machine	
  (FST)	
  implementation	
  
	
  	
  	
  	
  	
  
Uses	
  this	
  map	
  to	
  parse	
  the	
  query	
  to	
  find	
  terms	
  in	
  the	
  query	
  related	
  to	
  specific	
  metadata	
  fields.	
  
	
  	
  	
  	
  	
  
Example:	
  	
  ‘red	
  sofa’	
  
	
  	
  	
  	
  maps	
  	
  
red	
  =>	
  color	
  
sofa	
  =>	
  product_type	
  
	
  	
  	
  	
  	
  
Selects	
  the	
  longest	
  contiguous	
  phrase	
  in	
  its	
  lexicon	
  to	
  match	
  against	
  parts	
  of	
  the	
  query	
  
	
  	
  	
  	
  	
  
If	
  have	
  ‘coffee’	
  and	
  ‘coffee	
  filter’	
  in	
  the	
  lexicon	
  (i.e.	
  the	
  Solr	
  collection)	
  
the	
  query	
  ‘coffee	
  filter’	
  will	
  only	
  match	
  ‘coffee	
  filter’	
  
	
  	
  	
  	
  	
  
Can	
  construct	
  either	
  a	
  Solr	
  filter	
  query	
  (fq)	
  or	
  boost	
  query	
  (bq)	
  using	
  this	
  information.
15
Query	
  Autofiltering	
  -­‐	
  Knowledge	
  Mining
Doing	
  this	
  is	
  a	
  way	
  of	
  exploiting	
  the	
  field/value	
  relationships	
  in	
  the	
  collection	
  metadata.	
  
	
  	
  	
  
So	
  what	
  it	
  effectively	
  does	
  is	
  extract	
  the	
  knowledge	
  that	
  is	
  built-­‐in	
  to	
  your	
  collection	
  due	
  to	
  the	
  facet	
  
metadata	
  that	
  it	
  contains	
  and	
  applies	
  that	
  knowledge	
  to	
  parsing	
  of	
  the	
  query:	
  
	
  	
  	
  
•It	
  knows	
  that	
  ‘red’	
  is	
  a	
  color	
  because	
  ‘red’	
  is	
  a	
  value	
  in	
  the	
  ’color’	
  field.	
  
	
  	
  	
  	
  	
  
•It	
  ‘Short	
  circuits’	
  the	
  search-­‐then-­‐drill-­‐in	
  paradigm	
  -­‐>	
  just	
  search!	
  
	
  	
  	
  	
  	
  
•But	
  as	
  the	
  telemarketers	
  say:	
  	
  “Wait!	
  there’s	
  more!	
  …”	
  
The	
  knowledge	
  about	
  what	
  terms	
  mean	
  and	
  the	
  properties	
  of	
  the	
  term	
  field	
  (single	
  valued	
  vs.	
  multi-­‐valued)	
  
provide	
  other	
  opportunities	
  that	
  can	
  be	
  exploited!
16
Query	
  Autofiltering	
  -­‐	
  Language	
  Logic
Can	
  provide	
  a	
  semblance	
  of	
  “natural	
  language	
  processing”	
  by	
  breaking	
  a	
  query	
  into	
  semantic	
  
parts	
  and	
  applying	
  those	
  appropriately	
  
	
  	
  	
  	
  	
  
Natural	
  Language	
  Boolean	
  vs	
  Mathematical	
  Boolean	
  
	
  	
  	
  	
  	
  
Language	
  usage	
  of	
  boolean	
  terms	
  like	
  ‘AND’	
  and	
  ‘OR’	
  is	
  contextual!!	
  
	
  	
  	
  	
  	
  
“show	
  me	
  green	
  or	
  blue	
  shirts”	
  	
  
is	
  equivalent	
  to	
  	
  
“show	
  me	
  green	
  and	
  blue	
  shirts”	
  
	
  	
  	
  	
  	
  
The	
  user	
  means	
  ‘both’	
  in	
  each	
  case	
  so	
  ‘and’	
  and	
  ‘or’	
  are	
  synonyms	
  in	
  this	
  usage	
  context!	
  
	
  	
  	
  
but	
  in	
  
	
  	
  	
  	
  
“show	
  me	
  fast	
  and	
  inexpensive	
  cars”	
  	
  
-­‐	
  ‘and’	
  means	
  AND!	
  
	
  	
  	
  	
  	
  
Depends	
  on	
  field	
  cardinality!	
  	
  If	
  color	
  is	
  single-­‐valued	
  and	
  ‘attributes’	
  is	
  multi-­‐valued.	
  
Users	
  understand	
  this	
  intuitively	
  -­‐	
  Search	
  Engines	
  don’t	
  but	
  Query	
  Autofilter	
  can	
  get	
  this	
  right!
17
Query	
  Autofiltering	
  -­‐	
  Extensions	
  -­‐	
  Query	
  Patterns	
  
Once	
  you	
  know	
  what	
  individual	
  query	
  terms	
  and	
  phrases	
  mean,	
  you	
  can	
  exploit	
  this	
  by	
  creating	
  
templates	
  for	
  popular	
  query	
  patterns	
  
	
  	
  	
  	
  	
  
Query	
  Pattern:	
  	
  	
  	
  	
  Terms	
  +	
  Facet	
  fields	
  that	
  will	
  be	
  captured	
  by	
  Query	
  AutoFilter	
  
Query	
  Template:	
  	
  Query	
  template	
  with	
  placeholders	
  for	
  field	
  values	
  filled	
  in	
  if	
  user	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  query	
  matches	
  the	
  pattern.	
  
	
  	
  	
  	
  	
  
Example:	
  Music	
  Ontology	
  
	
  	
  	
  	
  	
  
User	
  Query:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Who’s	
  in	
  The	
  Who	
  
Query	
  Pattern:	
  	
  	
  	
  	
  	
  (who's	
  in,was	
  in,were	
  in,member	
  of,members	
  of)|${hasPerformer_ss}	
  
Query	
  Template:	
  	
  	
  memberOfGroup_ss:${hasPerformer_ss}	
  
User	
  Query:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Songs	
  Beatles	
  Covered	
  
Query	
  Pattern:	
  	
  	
  	
  	
  	
  (song,songs)|${hasPerformer_ss}|covered	
  
Query	
  Template:	
  	
  	
  hasPerformer_ss:${hasPerformer_ss}	
  AND	
  version_s:Cover	
  
18
Canned	
  Demo	
  -­‐	
  Who’s	
  In	
  The	
  Who	
  	
  
19
Typeahead	
  -­‐	
  Priming	
  the	
  Pump	
  with	
  Pivot	
  Facet	
  Patterns	
  	
  
Construct	
  semantically	
  meaningful	
  phrases	
  from	
  multiple	
  metadata	
  fields	
  
	
  	
  	
  	
  	
  
✦Inverse	
  of	
  Query	
  AutoFiltering	
  	
  -­‐	
  creates	
  suggestions	
  that	
  we	
  know	
  how	
  to	
  process!!	
  
	
  	
  	
  	
  	
  
✦Uses	
  Solr	
  Pivot	
  Facets	
  to	
  translate	
  field	
  patterns	
  to	
  suggested	
  query	
  phrases	
  
Examples:	
  	
  	
  
	
  	
  	
  	
  	
  	
  
${hasPerformer_ss}	
  ${Recording_Type_s}s	
  	
  	
  	
  	
  
=>	
  Beatles	
  Songs,	
  Led	
  Zeppelin	
  Songs,	
  Billy	
  Joel	
  Songs,	
  Frank	
  Zappa	
  Songs	
  etc.	
  
	
  	
  	
  
${genres_ss}	
  ${Musician_Type_ss}s	
  
	
  =>	
  	
  Classical	
  Pianists,	
  Hard	
  Rock	
  Guitarists,	
  Jazz	
  Drummers	
  
	
  	
  	
  
${Recording_Type_s}s	
  ${hasPerformer_ss}	
  	
  Covered	
  	
  	
  (with	
  fq	
  version_s:Cover)	
  
=>	
  Songs	
  Jimi	
  Hendrix	
  Covered
20
Building	
  a	
  Suggester	
  with	
  Dynamic	
  Context	
  	
  
Assertion:	
  	
  Facets	
  can	
  be	
  used	
  to	
  find	
  similar	
  things.	
  
	
  	
  	
  	
  	
  
Example:	
  	
  John	
  Lennon	
  and	
  Paul	
  McCartney	
  share	
  many	
  attributes,	
  activities,	
  group	
  
memberships,	
  in	
  common	
  	
  
-­‐>	
  They	
  are	
  closely	
  related	
  entities.	
  
	
  	
  	
  	
  	
  
Search	
  Agendas:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  
Users	
  tend	
  to	
  have	
  some	
  high	
  level	
  goal	
  when	
  searching	
  (e.g.	
  Find	
  out	
  information	
  about	
  The	
  
Beatles)	
  
	
  	
  	
  	
  	
  
Agenda’s	
  can	
  change	
  in	
  a	
  session,	
  but	
  it	
  is	
  likely	
  that	
  queries	
  issued	
  within	
  a	
  short	
  period	
  of	
  time	
  
will	
  have	
  a	
  similar	
  goal.	
  
	
  	
  	
  	
  	
  
Conclusion:	
  	
  
	
  	
  	
  	
  
Facet	
  meta-­‐information	
  from	
  facets	
  can	
  be	
  used	
  to	
  associate	
  similar	
  things	
  or	
  concepts	
  within	
  a	
  
search	
  session.
21
Building	
  a	
  Suggester	
  with	
  Dynamic	
  Context	
  	
  
Suggester	
  Builder	
  Design	
  (Fusion	
  Connector)
Uses	
  Facet	
  Queries	
  against	
  a	
  Content	
  
Collection	
  to	
  create	
  additional	
  
metadata	
  for	
  the	
  Suggester	
  or	
  
Typeahead	
  Collection.	
  
	
  	
  	
  
This	
  contextual	
  metadata	
  can	
  then	
  
be	
  used	
  for:	
  
	
  	
  	
  	
  	
  
•	
  Security	
  Trimming	
  of	
  Typeahead	
  
suggestions	
  
	
  	
  	
  	
  	
  
•	
  Dynamic	
  boosting	
  of	
  similar	
  
suggestions	
  within	
  a	
  user	
  session
22
Building	
  a	
  Suggester	
  with	
  Dynamic	
  Context	
  	
  
Bring	
  back	
  other	
  fields	
  in	
  addition	
  to	
  displayed	
  suggestion	
  text	
  	
  
(i.e.,	
  the	
  ones	
  that	
  were	
  calculated	
  using	
  faceting)	
  
	
  	
  	
  
If	
  a	
  query	
  is	
  used	
  to	
  search,	
  temporarily	
  store	
  its	
  associated	
  metadata	
  in	
  a	
  circular	
  cache	
  on	
  the	
  browser.	
  
	
  	
  	
  	
  	
  
When	
  submitting	
  the	
  next	
  typeahead	
  query,	
  add	
  the	
  cached	
  information	
  from	
  the	
  queue	
  as	
  boost	
  
queries.
Type	
  ‘j’	
  -­‐	
  get	
  back	
  
Jai	
  Johnny	
  Johanson	
  Bands	
  
Jai	
  Johnny	
  Johanson	
  Groups	
  
J.J.	
  Johnson	
  
Jai	
  Johnny	
  Johanson	
  
Juke	
  Joint	
  Jezebel	
  
Juke	
  Joint	
  Jimmy
Just	
  searched	
  for	
  ‘Paul	
  McCartney’	
  then	
  type	
  ‘j’	
  
John	
  Lennon	
  
John	
  Lennon	
  Songs	
  
John	
  Lennon	
  Songs	
  Covered	
  
James	
  P	
  Johnson	
  Songs	
  (?)	
  
John	
  Lennon	
  Originals	
  
Hey	
  Jude
23
Building	
  a	
  Suggester	
  with	
  Dynamic	
  Context	
  	
  
Paul	
  McCartney’s	
  “Meta-­‐informational	
  Context”:
genres_ss:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Rock,	
  Rock	
  &	
  Roll,	
  Soft	
  Rock,	
  Pop	
  Rock	
  
hasPerformer_ss:	
  	
  	
  	
  	
  	
  	
  	
  	
  Beatles,	
  Paul	
  McCartney,	
  José	
  Feliciano,	
  Jimi	
  Hendrix,	
  Joe	
  Cocker,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Aretha	
  Franklin,	
  Bon	
  Jovi,	
  Elvis	
  Presley	
  (	
  …	
  and	
  many	
  more)	
  
composer_ss:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Paul	
  McCartney,	
  John	
  Lennon,	
  Ringo	
  Starr,	
  George	
  Harrison,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  George	
  Jackson,	
  Michael	
  Jackson,	
  Sonny	
  Bono	
  
memberOfGroup_ss:	
  	
  	
  Beatles,	
  Wings
Dynamic	
  Boost	
  Query:
genres_ss:”Rock”^50	
  genres_ss:”Rock	
  &	
  Roll”^50	
  genres_ss:”Soft	
  Rock”^50	
  genres_ss:”Pop	
  Rock”^50	
  	
  
hasPerformer_ss:”Beatles”^50	
  hasPerformer_ss:”Paul	
  McCartney”^50	
  hasPerformer_ss:”José	
  
Feliciano”^50	
  hasPerformer_ss:”Jimi	
  Hendrix”^50	
  	
  
composer_ss:”Paul	
  McCartney”^50	
  composer_ss:”John	
  Lennon”^50	
  composer_ss:”Ringo	
  Starr”^50	
  
composer_ss:”George	
  Harrison”^50	
  	
  
memberOfGroup_ss:”Beatles”^50	
  memberOfGroup_ss:”Wings”^50
24
Text	
  Mining	
  Analyses
Problem:	
  	
  Metadata	
  needs	
  to	
  be	
  improved	
  for	
  useful	
  application	
  of	
  QAF	
  (i.e.	
  Real	
  World)	
  
	
  	
  	
  	
  
Case	
  1:	
  	
  
	
  	
  	
  	
  	
  
Extracting	
  product	
  type	
  and	
  product	
  attributes	
  metadata	
  from	
  short	
  product	
  descriptions	
  in	
  eCommerce	
  
data	
  -­‐	
  dealing	
  with	
  precision	
  and	
  recall	
  
	
  	
  	
  	
  
Case	
  2:	
  
	
  	
  	
  	
  	
  
Large	
  text	
  documents.	
  	
  Want	
  to	
  extract	
  keywords	
  and	
  assign	
  categories	
  to	
  documents.	
  
Interesting	
  properties	
  of	
  facets	
  when	
  directed	
  towards	
  unstructured	
  text:	
  
Facet	
  ratios	
  of	
  positive	
  and	
  negative	
  queries	
  yield	
  “keyword	
  clusters”	
  
Document	
  clustering	
  of	
  keyword	
  cluster	
  vectors	
  give	
  crisp	
  categories
25
Auto	
  phrasing	
  vs.	
  Auto	
  filtering
Auto	
  Phrasing	
  	
  
-­‐Multi-­‐term	
  phrases	
  that	
  refer	
  to	
  a	
  single	
  entity.	
  
-­‐Used	
  as	
  a	
  workaround	
  to	
  Solr	
  “Multi-­‐term	
  synonym	
  problem”	
  
-­‐That	
  is	
  now	
  fixed	
  (as	
  of	
  6.4.1	
  -­‐	
  thanks	
  Steve	
  Rowe!)	
  
-­‐Is	
  Auto	
  phrasing	
  solution	
  now	
  obsolete?	
  
-­‐Answer:	
  NOT!!!,	
  that	
  was	
  exploiting	
  a	
  side	
  effect	
  of	
  what	
  it	
  does!	
  
-­‐	
  Uses	
  knowledge	
  from	
  a	
  phrase	
  list	
  to	
  determine	
  what	
  is	
  an	
  auto	
  phrase	
  
-­‐Works	
  on	
  tokenized	
  text	
  fields	
  (implemented	
  as	
  a	
  Lucene	
  TokenFilter)	
  
	
  	
  	
  	
  	
  
Query	
  Auto	
  Filtering	
  	
  
-­‐	
  Utilizes	
  information	
  from	
  non-­‐tokenized	
  text	
  fields	
  -­‐	
  inherently	
  solves	
  multi-­‐term	
  problem	
  
	
  	
  	
  	
  	
  
Strategy	
  for	
  “unstructured	
  text”:	
  
	
  	
  	
  	
  	
  
Use	
  auto	
  phrasing	
  to	
  extract	
  phrase	
  metadata	
  (	
  keywords	
  )	
  from	
  unstructured	
  text	
  
This	
  metadata	
  can	
  then	
  be	
  consumed	
  by	
  Query	
  Autofilter	
  at	
  search	
  time.
26
Simple	
  Keyword	
  Analysis
“Unstructured” Text Lucene Analyzer with Auto Phrasing Extensions
Spark Job
Metadata	
  we	
  would	
  like	
  to	
  have	
  but	
  don’t	
  have	
  -­‐	
  requires	
  lots	
  of	
  manual	
  curation	
  ==	
  $$$	
  
	
  	
  	
  	
  	
  
Have	
  short	
  descriptive	
  text	
  fields	
  that	
  can	
  be	
  mined	
  to	
  glean	
  useful	
  metadata	
  such	
  as	
  product	
  type,	
  material,	
  
size.
Special	
  Sauce	
  Ingredients:	
  
	
  	
  	
  	
  	
  
➡Semantically	
  pure	
  lexicons	
  (things,	
  brands,	
  attributes,	
  dimensions,	
  logos,	
  materials)	
  of	
  key	
  terms	
  
	
  	
  	
  	
  
➡Auto	
  phrasing-­‐based	
  Lucene	
  Analysis	
  to	
  extract	
  key	
  terms	
  and	
  “stop	
  phrases”	
  (e.g.	
  Mr	
  Coffee)	
  
	
  	
  	
  	
  	
  
➡Expansions	
  and	
  Relations	
  based	
  on	
  noun	
  phrases	
  in	
  lexicon.	
  	
  Contextually	
  aware	
  management	
  of	
  
precision	
  and	
  recall.	
  
	
  	
  	
  
➡Tricks	
  to	
  deal	
  with	
  “leather	
  case	
  for	
  iPhone”,	
  “DSLR	
  camera	
  with	
  50-­‐mm	
  lens”
27
Expansions	
  and	
  Relations
Motivation:	
  eCommerce	
  Use	
  Case:	
  
Search	
  for	
  ‘iPhone’	
  -­‐	
  get	
  iPhone	
  cases	
  and	
  iPhone	
  chargers	
  mixed	
  in.	
  
	
  -­‐	
  	
  
Want	
  to	
  have	
  both	
  BUT	
  want	
  iPhones	
  at	
  the	
  TOP	
  of	
  the	
  result	
  set.	
  
=>	
  TF/IDF	
  doesn’t	
  always	
  deliver	
  on	
  this	
  (can’t	
  control	
  relevance	
  -­‐	
  you	
  get	
  what	
  you	
  get)	
  
i.e.	
  -­‐	
  want	
  recall	
  for	
  up	
  sell	
  opportunities	
  -­‐	
  so	
  relax	
  precision	
  a	
  bit.	
  
Relevance	
  (what	
  I	
  want	
  is	
  on	
  top)	
  is	
  still	
  very	
  important	
  
Search	
  for	
  ‘iPhone	
  case’	
  
Now	
  I	
  want	
  precision	
  -­‐	
  just	
  show	
  me	
  iPhone	
  cases	
  please	
  ‘cause	
  I	
  already	
  got	
  a	
  stinkin’	
  iPhone!!	
  	
  	
  
Why	
  else	
  would	
  I	
  be	
  looking	
  for	
  accessories	
  for	
  it	
  ???
28
Expansions	
  and	
  Relations
Noun	
  phrases	
  have	
  structure:	
  
	
  	
  	
  	
  
end	
  table	
  
side	
  table	
  
dining	
  room	
  table	
  
picnic	
  table	
  
coffee	
  table	
  
folding	
  table	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
=>	
  Are	
  ALL	
  types	
  of	
  tables	
  
	
  	
  	
  
table	
  cloth	
  
table	
  setting	
  
table	
  lamp	
  
table	
  chair	
  
	
  	
  	
  	
  	
  
=>	
  Are	
  table	
  related	
  things.	
  
Expansions	
  -­‐	
  IS-­‐A	
  relationships	
  
	
  	
  	
  	
  	
  
Phrases	
  that	
  end	
  in	
  ‘table’	
  are	
  specific	
  types	
  of	
  tables	
  
classify	
  ‘end	
  table’	
  as	
  ‘table’	
  too	
  
	
  	
  	
  	
  	
  
=>	
  search	
  for	
  ‘table’	
  returns	
  all	
  types	
  of	
  tables	
  
	
  	
  	
  
=>	
  search	
  for	
  ‘end	
  table’	
  just	
  returns	
  end	
  tables	
  
	
  	
  	
  	
  	
  
Relations	
  -­‐	
  IS-­‐LIKE	
  Relationships	
  
	
  	
  	
  
Phrases	
  that	
  start	
  with	
  ‘table’	
  are	
  table	
  related	
  things	
  
	
  	
  	
  	
  	
  
Add	
  table	
  related	
  things	
  to	
  fq	
  for	
  ‘table’	
  as	
  OR	
  list	
  
	
  	
  	
  	
  	
  
Boost	
  search	
  term	
  ‘table’	
  more	
  than	
  	
  
table	
  related	
  things	
  -­‐	
  get	
  both	
  but	
  tables	
  are	
  first	
  
	
  	
  	
  	
  	
  
Table	
  related	
  things	
  don’t	
  have	
  relations	
  	
  
-­‐	
  search	
  is	
  more	
  specific	
  -­‐	
  just	
  get	
  that	
  thing!
29
Unstructured	
  Text	
  -­‐	
  Oh	
  My!
The	
  problem	
  of	
  unstructured	
  text	
  is	
  that	
  it	
  is	
  …	
  well	
  unstructured	
  ….	
  or	
  is	
  it?	
  (Linguists	
  don’t	
  think	
  so!)	
  
	
  	
  	
  	
  	
  
We	
  search	
  but	
  don’t	
  typically	
  facet	
  on	
  unstructured	
  text	
  fields	
  (i.e.	
  tokenized	
  fields).	
  	
  
	
  	
  	
  Even	
  though	
  in	
  Solr	
  we	
  can	
  facet	
  on	
  anything	
  
-­‐	
  Get	
  all	
  of	
  the	
  tokenized	
  terms	
  and	
  their	
  counts	
  as	
  facet	
  values	
  -­‐>	
  very	
  high	
  cardinality	
  
-­‐	
  Absolutely	
  useless	
  for	
  UI	
  drill	
  in	
  -­‐	
  so	
  this	
  is	
  basically	
  a	
  no-­‐no	
  at	
  query	
  time	
  
	
  	
  	
  	
  	
  
=>	
  But	
  that	
  is	
  not	
  all	
  that	
  facets	
  are	
  good	
  for	
  so	
  …	
  wait	
  a	
  minute	
  (light-­‐bulb	
  moment)!	
  <=	
  
	
  	
  	
  	
  	
  
What	
  if	
  we	
  DID	
  facet	
  on	
  the	
  tokens	
  and	
  used	
  their	
  stats	
  to	
  do	
  some	
  text	
  analysis?	
  
	
  	
  	
  	
  	
  
=>	
  It	
  turns	
  out	
  we	
  can	
  use	
  facets	
  to	
  detect	
  keywords	
  in	
  documents.	
  	
  <=	
  
	
  	
  	
  	
  	
  	
  	
  
Keywords	
  -­‐	
  Terms	
  that	
  occur	
  in	
  relatively	
  few	
  documents	
  (but	
  not	
  too	
  few).	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐	
  Tend	
  to	
  be	
  important	
  words	
  in	
  some	
  subjects	
  but	
  not	
  others	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐	
  i.e.	
  their	
  usage	
  is	
  highly	
  contextual	
  to	
  a	
  subject!	
  
	
  	
  	
  	
  	
  
Keywords	
  for	
  the	
  same	
  subject	
  area	
  tend	
  to	
  occur	
  together	
  because	
  they	
  share	
  the	
  same	
  context!	
  	
  	
  
Facets	
  are	
  a	
  great	
  context	
  mining	
  tool!!	
  	
  	
  
Sounds	
  like	
  a	
  FIT!
30
Facet	
  Ratios	
  =>	
  Keyword	
  Clustering
Method	
  to	
  my	
  Madness:	
  
	
  	
  	
  	
  
•	
  	
  Tokenize	
  text	
  with	
  auto	
  phrasing,	
  stop	
  words	
  and	
  synonyms	
  
	
  	
  -­‐	
  store	
  tokens	
  in	
  a	
  multi-­‐valued	
  field	
  with	
  DocValues	
  
	
  	
  -­‐	
  (yes	
  you	
  can	
  facet	
  on	
  a	
  text	
  field	
  but	
  it	
  tends	
  to	
  hit	
  a	
  wall	
  -­‐	
  2M	
  word	
  limit	
  on	
  facet	
  values)	
  
	
  	
  	
  	
  	
  	
  
•	
  	
  Using	
  the	
  /terms	
  handler,	
  get	
  each	
  term	
  in	
  the	
  text	
  field.	
  
	
  	
  	
  
•	
  	
  Submit	
  two	
  queries	
  	
  
-­‐	
  one	
  with	
  text_field:[term]	
  	
  (positive	
  Q)	
  
-­‐	
  one	
  with	
  -­‐text_field:[term]	
  (negative	
  Q)	
  
	
  	
  	
  	
  	
  
•	
  	
  Calculate	
  the	
  following	
  ratio:	
  
•	
  	
  Take	
  the	
  xlog(x)	
  of	
  this	
  ratio	
  (for	
  better	
  discrimination)	
  
-­‐for	
  each	
  term,	
  take	
  the	
  best	
  related	
  terms	
  above	
  some	
  threshold
	
  	
  Facet	
  counts	
  (posizve	
  Q)	
  
————————————	
  
	
  	
  Total	
  counts	
  (posizve	
  Q)	
  
———————————————	
  
	
  	
  Facet	
  counts	
  (negazve	
  Q)	
  
	
  	
  	
  	
  ————————————-­‐	
  
	
  	
  	
  	
  	
  	
  	
  Total	
  counts	
  (negazve	
  Q)
31
Facet	
  Ratios	
  =>	
  Keyword	
  Clusters
Authentication
1002.7227722772277	
  firewall	
  
561.5247524752475	
  authorization	
  
401.08910891089107	
  passwords	
  
374.34983498349834	
  plugging	
  
160.43564356435644	
  transport	
  
88.81258840169731	
  tied	
  
80.21782178217822	
  weblogic,bootstrap,computationally,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  finely,usernames	
  
56.152475247524755	
  ssl	
  
40.10891089108911	
  login,augments,dialog,encapsulated,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  fallback,privileged	
  
34.87731381833836	
  kerberos	
  
32.087128712871284	
  permission	
  
28.64922206506365	
  password	
  
26.739273927392738	
  acls,realm,conversely	
  
25.523852385238524	
  streaming	
  
23.874351720886374	
  enabling	
  
23.174037403740375	
  remote
22.91937765205092	
  protocols	
  	
  
20.054455445544555	
  globally,bind,indirectly,redirects	
  
18.51180502665651	
  ldap	
  
16.043564356435642	
  proxy	
  
14.585058505850585	
  memberships	
  
14.465508845966564	
  permissions	
  
13.369636963696369	
  protect	
  
12.534034653465346	
  grained	
  
11.918076379066479	
  linux	
  
11.523002023959302	
  advice	
  
11.45968882602546	
  authenticate	
  
11.23049504950495	
  hash	
  
11.184215536938309	
  message	
  
10.106182271770484	
  plugins	
  
10.027227722772277	
  header,controlled,crafting	
  
9.43739079790332	
  username	
  
8.913091309130913	
  logins	
  
32
Facet	
  Ratios	
  =>	
  Keyword	
  Clusters
Phase	
  II	
  -­‐	
  check	
  related	
  terms	
  for	
  correlation	
  (count	
  agreements	
  in	
  their	
  related	
  terms)	
  
8984.39603960396	
  authorization	
  
4411.980198019802	
  passwords	
  
4010.891089108911	
  firewall	
  
802.1782178217823	
  usernames	
  
601.6336633663367	
  password	
  
508.046204620462	
  realm	
  
505.3722772277228	
  ssl	
  
481.3069306930693	
  login	
  
425.7715156130997	
  ldap	
  
418.52776582006027	
  kerberos	
  
320.8712871287129	
  bootstrap,finely	
  
256.69702970297027	
  permission	
  
216.98263268949847	
  permissions	
  
213.9141914191419	
  acls	
  
185.392299229923	
  remote
167.1204620462046	
  enabling	
  
153.14311431143113	
  streaming	
  
132.12347117064647	
  username	
  
120.32673267326733	
  bind	
  
106.95709570957095	
  conversely	
  
53.478547854785475	
  logins	
  
46.09200809583721	
  advice	
  
39.269812336882225	
  requests	
  
32.52073856034252	
  zookeeper	
  
31.13477101134771	
  plugin	
  
28.422165293452423	
  admin	
  
26.739273927392738	
  bother	
  
25.465975168945466	
  controls	
  
24.68240670220868	
  native	
  
22.716551301147813	
  require
33
Keyword	
  Vector	
  Document	
  Clustering
Use	
  the	
  Keyword	
  Vectors	
  to	
  compute	
  distances	
  between	
  documents	
  rather	
  than	
  raw	
  TF/IDF	
  
=>	
  Higher	
  Signal	
  To	
  Noise	
  
	
  
Tokenizer Compute Keyword Vector K-Means Clustering
Cluster: 98
stump_the_chump: 15159.853372701356
stump: 12931.059994928455
prize: 12378.463050783357
sight: 2943.012345679012
tough: 2872.8905092427412
question: 2827.6045026881716
judge: 2353.9344100731737
submit: 2250.350305525309
session: 2147.8922671532514
panel: 1888.958487954128
hostetter: 1722.9000585471174
grant: 1600.741568627451
chump: 1558.9513516128222
lucene_revolution: 1353.774672198919
spot: 1211.5869933577087
award: 1048.082490095137
mock: 1005.0931680939833
conference: 903.0025141117053
muir: 878.7673037468955
seat: 870.9154155915239
hot: 799.5070748299321
Get a list of documents for each cluster - label the clusters ==> Document Category
34
Keyword	
  Vector	
  Document	
  Clustering
Cluster: 85
young_generation: 58393.71450722004
throughput_collector: 51879.272543859726
tenured_space: 45769.06697989158
young_generation_collector: 36786.321736596736
tenure: 33389.738840692735
stop_the_world: 31288.96145142277
concurrent_low_pause_collector: 29612.759802867382
useadaptivesizepolicy: 26927.686004351213
useparnewgc: 26819.583333333332
useparalleloldgc: 25450.34188034188
jvm: 25354.346667094775
young_space: 22546.65166222556
useparallelgc: 22168.126984126982
collector: 21226.31425547997
survivor_space: 20836.967617437283
heap: 18883.16487771459
garbage_collection: 18247.692641501046
garbage_collector: 17929.789619546355
command_line_options: 10111.764705882353
sweep: 9803.141574757969
Thank You

More Related Content

What's hot

Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologyLucidworks
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationZbyszko Papierski
 

What's hot (20)

Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW Technology
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation exploration
 

Similar to A Multifaceted Look at Faceting for Relevant Search

The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsBen DeMott
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsZemanta
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semanticsAndraz Tori
 
E-commerce Search Engine with Apache Lucene/Solr
E-commerce Search Engine with Apache Lucene/SolrE-commerce Search Engine with Apache Lucene/Solr
E-commerce Search Engine with Apache Lucene/SolrVincenzo D'Amore
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Barbara Starr
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchTO THE NEW | Technology
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for BeginnersValeria de Paiva
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...Dr. Haxel Consult
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningValeria de Paiva
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 

Similar to A Multifaceted Look at Faceting for Relevant Search (20)

The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and Semantics
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semantics
 
E-commerce Search Engine with Apache Lucene/Solr
E-commerce Search Engine with Apache Lucene/SolrE-commerce Search Engine with Apache Lucene/Solr
E-commerce Search Engine with Apache Lucene/Solr
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for Beginners
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Searchland2
Searchland2Searchland2
Searchland2
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

A Multifaceted Look at Faceting for Relevant Search

  • 1. A Multifaceted Look At Faceting - Using Facets “Under the Hood" to Facilitate Relevant Search Ted Sullivan Senior Solutions Architect, Lucidworks Professional Services
  • 2. Agenda • Facet history - why solr faceting rocks • Text - unstructured? I hardly think so! • Using facets as context mining engines
  • 3. 3 01 Facets are Metadata Facet: “A particular aspect or feature of something.” Metadata: ”Data about data" - attributes, aspects, descriptors, features, properties, traits Facets == Metadata Metadata Semantics: what, where, when, why name, size, shape, color, material, texture manufacturer, number of outlets, voltage, is pre-assembled … address, phone number, birth date, user rating hand size, likes country music, believes in climate change … Metadata Dependencies: Some metadata fields depend on “what" the “thing” is e.g. People have different attributes than Toaster Ovens
  • 4. 4 01 A  Little  History:  Facet  Technology  Terminology Verity K2 Parametric Search Fast ESP Navigators MS Fast Refiners Endeca Dimensions Solr Facets Google (& Autonomy too) Facets? Facets? We don’t need no stinkin’ facets!
  • 5. 5 Traditional  Uses  of  Facets Faceted Navigation - aka “Refinement” / “Drill down” Allows initial query to be ambiguous without requiring the user to “rethink” what to search for. Neatly handles the old “No Results Found” trial-and-error bug-a-boo. Down sides: Facets should not be used as a ‘band-aid’ for poorly tuned relevance!!! •The “need” for faceted navigation forces us to favor recall over precision. (Maybe this is why Google avoids them!) •… because you have to drill-in to something. If users have to use facets to drill in to what they really want - Why search in the first place - why not just browse? Facet 'noise' - false positives due to poor precision / high recall causes weird outliers (ML techniques like Signal Aggregation to improve relevance do not help here)
  • 6. 6 Visualization Facets show a high-level or global 'context' of what the result set is “about" Dashboards - Search Driven BI: Eye Candy: Pie charts, bar charts, histograms, etc. - make use of the basic statistical nature of facets (i.e. counts - but now lots of things mean, median, std, skewness, etc). Data Analytics: Solr now enhances facet statistics to include many more useful mathematical calculations. Use basic analytics such as mean, standard deviation and the like to do more complex analytics similar to what is done with Databases in OLAP “cubes”. Time-series: Range Facets on a Date-Time (Trie)Field Advantage - analytics are search driven so that the “cube” can change with the query Facet  Analytics  -­‐  Maturing  Rapidly
  • 7. 7 Solr  Facets  are  Dynamic  not  Static In other search engines like Verity, Fast or Endeca: Facet values are computed at index time - thereby making them Static at query time. Lucene did not have faceting originally -Solr added faceting “on the fly” - i.e. at query time (before Lucene added it) -Solr faceting is thus Dynamic! The main hurdle to doing it this way is to make it FAST (mission accomplished - thanks Yonik!) Although this would seem to be what we engineers call a "bolt on" - in hindsight this was a very fortuitous evolutionary path!! Once you do this, there are serious advantages over index-time faceting!! One main advantage is flexibility -In Solr - you can facet on just about anything - even things that weren’t thought about when the collection was designed (function queries - extensible with ValueSource impls!) -Good, good, good!!!
  • 8. 8 Very  Brief  Survey  of  Solr  Faceting  Methods Metadata  fields  -­‐  prefer  non-­‐tokenized  field  types     (you  can  facet  on  tokenized  fields  too  -­‐  but  why  would  you  want  to?)       enum  method  =>  filter  cached  filter  queries   fc  method                =>  uses  the  FieldCache  (  it  now  uses  DocValues)                 facet  query   facet  prefix   facet  range   function  queries  and  Value  Sources   pivot  facets   excluding  and  tagging   JSON  Facets   Facet  performance  tuning  (gotchas)     -­‐  I  could  talk  about  this  at  length  but  …  nah!  Read  the  Wikis!
  • 9. 9 Language  Semantics Nouns,  Verbs,  Adjectives,  Adverbs,  Prepositions,  etc.         What  type  of  thing  is  Is?                                                      Car   What  it’s  name?                                                                                  Lamborghini  Aventador   Where  is  it  Made?                                                                            Italy   How  much  horsepower  does  it  have?        A  hellova  lot   How  fast  does  it  go?                                                                    Very   How  much  does  it  cost?                                                        If  you  have  to  ask  …           My  Search  Philosophy:                   Humans  use  language  to  search  because  that  is  how  we  reason  about  things.               Search  engines  need  to  do  a  better  job  of  understanding  language  to  better  help  us  to  find  the   things  we  are  looking  for.         Search  index  schemas  describe  a  machine  oriented  /  data-­‐centric  view  of  things     -­‐  want  to  translate  that  to  and  from  language-­‐centric  views   -­‐  from  a  search  engine  perspective  -­‐  descriptive  text  is  “unstructured”  data  -­‐  but  not  to  us!
  • 10. 10 Metadata  and  Text  Transforms Metadata:  Data-­‐centric  view  of  things          <=>            Text:  Language-­‐centric  view  of  things             Metadata  terms  are  embedded  in  language           Compose  descriptive  text  about  a  thing  from  its  attributes  or  properties           -­‐>  Create  linguistic  expressions  from  metadata               Deduce  attributes  or  properties  of  a  thing  from  descriptive  text               -­‐>  Compute  metadata  by  linguistic  analyses  of  text   Search  problem  -­‐  match  terms  in  query  with  things  in  index             -­‐>  Knowledge  of  word  meanings  is  power!!!           -­‐>  Facet  metadata  constitutes  knowledge  that  can  be  leveraged!!
  • 11. FACETS ARE CONTEXT DISCOVERY TOOLS Lemma 1: Similar things occur in similar contexts Lemma 2: Facets are context exploration tools Assertion: Facets can be used to find similar things
  • 12. 12 Exploiting  Facet  Metadata Facets  provide  a  sort  of  global  metadata  CONTEXT  for  a  search  result  set             In  addition  to  faceting,  how  can  we  exploit  metadata  to  enhance  search?             ❖  Turning  facet  metadata  inside-­‐out:    Query  Autofiltering             ❖  Using  Facets  to  build  contextual  typeahead  suggester:                 •Pivot  facets  to  construct  phrases  from  structured  data.             •Extract  related  information  using  facets  at  index  time   to  enable  security  trimming  and  dynamic  boosting             ❖  Using  Facets  for  text  analytics  to  generate  better  facets             •Facet  ratios  of  positive  and  negative  queries  on  key  terms     -­‐>  detects  “key  term  clusters”                 •Document  clustering  using  key  term  cluster  vectors   -­‐>  detects  key  term  categories
  • 13. 13 Example:  Detecting  User  Intent  in  eCommerce Separating  the  ‘What’  from  the  ‘What  about’               (  i.e.    a  Thing  vs.  a  Really  BIG  Thing)         microwave  safe  dishes                                                            ‘microwave  safe’  -­‐  adjective  phrase         compact  microwave  oven                                                ‘microwave  oven’  -­‐  noun  phrase         microwave                                                                                                    ‘microwave’  -­‐  noun  -­‐  contraction  for  ‘microwave  oven’             coffee  filter                                                                                                  ‘coffee  filter’  -­‐  noun-­‐noun  phrase  -­‐  a  filter         coffee                                                                                                                      ‘coffee’  -­‐  noun  -­‐  a  beverage   coff   coffee  table                                                                                                  ‘coffee  table’  -­‐  noun-­‐noun  phrase  -­‐  table           coffee  colored  sheets                                                                ‘coffee  colored’  -­‐  adjective  phrase         coffee  ice  cream                                                                                  coffee  flavored  ice  cream                   milk  chocolate                                                                                          a  type  of  chocolate             chocolate  milk                                                                                          flavored  milk
  • 14. 14 Query  Autofiltering Uses  field  values  to  generate  a  “reverse  lookup  map”  that  maps  values  to  fields     that  contain  them.            -­‐  Inverts  the  “uninverted  map"  -­‐  ah  …  another  type  of  inverted  map  -­‐  values  -­‐>  fields            -­‐  Uses  the  Lucene  SynonymMap  Finite  State  Machine  (FST)  implementation             Uses  this  map  to  parse  the  query  to  find  terms  in  the  query  related  to  specific  metadata  fields.             Example:    ‘red  sofa’          maps     red  =>  color   sofa  =>  product_type             Selects  the  longest  contiguous  phrase  in  its  lexicon  to  match  against  parts  of  the  query             If  have  ‘coffee’  and  ‘coffee  filter’  in  the  lexicon  (i.e.  the  Solr  collection)   the  query  ‘coffee  filter’  will  only  match  ‘coffee  filter’             Can  construct  either  a  Solr  filter  query  (fq)  or  boost  query  (bq)  using  this  information.
  • 15. 15 Query  Autofiltering  -­‐  Knowledge  Mining Doing  this  is  a  way  of  exploiting  the  field/value  relationships  in  the  collection  metadata.         So  what  it  effectively  does  is  extract  the  knowledge  that  is  built-­‐in  to  your  collection  due  to  the  facet   metadata  that  it  contains  and  applies  that  knowledge  to  parsing  of  the  query:         •It  knows  that  ‘red’  is  a  color  because  ‘red’  is  a  value  in  the  ’color’  field.             •It  ‘Short  circuits’  the  search-­‐then-­‐drill-­‐in  paradigm  -­‐>  just  search!             •But  as  the  telemarketers  say:    “Wait!  there’s  more!  …”   The  knowledge  about  what  terms  mean  and  the  properties  of  the  term  field  (single  valued  vs.  multi-­‐valued)   provide  other  opportunities  that  can  be  exploited!
  • 16. 16 Query  Autofiltering  -­‐  Language  Logic Can  provide  a  semblance  of  “natural  language  processing”  by  breaking  a  query  into  semantic   parts  and  applying  those  appropriately             Natural  Language  Boolean  vs  Mathematical  Boolean             Language  usage  of  boolean  terms  like  ‘AND’  and  ‘OR’  is  contextual!!             “show  me  green  or  blue  shirts”     is  equivalent  to     “show  me  green  and  blue  shirts”             The  user  means  ‘both’  in  each  case  so  ‘and’  and  ‘or’  are  synonyms  in  this  usage  context!         but  in           “show  me  fast  and  inexpensive  cars”     -­‐  ‘and’  means  AND!             Depends  on  field  cardinality!    If  color  is  single-­‐valued  and  ‘attributes’  is  multi-­‐valued.   Users  understand  this  intuitively  -­‐  Search  Engines  don’t  but  Query  Autofilter  can  get  this  right!
  • 17. 17 Query  Autofiltering  -­‐  Extensions  -­‐  Query  Patterns   Once  you  know  what  individual  query  terms  and  phrases  mean,  you  can  exploit  this  by  creating   templates  for  popular  query  patterns             Query  Pattern:          Terms  +  Facet  fields  that  will  be  captured  by  Query  AutoFilter   Query  Template:    Query  template  with  placeholders  for  field  values  filled  in  if  user                                                                      query  matches  the  pattern.             Example:  Music  Ontology             User  Query:                        Who’s  in  The  Who   Query  Pattern:            (who's  in,was  in,were  in,member  of,members  of)|${hasPerformer_ss}   Query  Template:      memberOfGroup_ss:${hasPerformer_ss}   User  Query:                        Songs  Beatles  Covered   Query  Pattern:            (song,songs)|${hasPerformer_ss}|covered   Query  Template:      hasPerformer_ss:${hasPerformer_ss}  AND  version_s:Cover  
  • 18. 18 Canned  Demo  -­‐  Who’s  In  The  Who    
  • 19. 19 Typeahead  -­‐  Priming  the  Pump  with  Pivot  Facet  Patterns     Construct  semantically  meaningful  phrases  from  multiple  metadata  fields             ✦Inverse  of  Query  AutoFiltering    -­‐  creates  suggestions  that  we  know  how  to  process!!             ✦Uses  Solr  Pivot  Facets  to  translate  field  patterns  to  suggested  query  phrases   Examples:                   ${hasPerformer_ss}  ${Recording_Type_s}s           =>  Beatles  Songs,  Led  Zeppelin  Songs,  Billy  Joel  Songs,  Frank  Zappa  Songs  etc.         ${genres_ss}  ${Musician_Type_ss}s    =>    Classical  Pianists,  Hard  Rock  Guitarists,  Jazz  Drummers         ${Recording_Type_s}s  ${hasPerformer_ss}    Covered      (with  fq  version_s:Cover)   =>  Songs  Jimi  Hendrix  Covered
  • 20. 20 Building  a  Suggester  with  Dynamic  Context     Assertion:    Facets  can  be  used  to  find  similar  things.             Example:    John  Lennon  and  Paul  McCartney  share  many  attributes,  activities,  group   memberships,  in  common     -­‐>  They  are  closely  related  entities.             Search  Agendas:                     Users  tend  to  have  some  high  level  goal  when  searching  (e.g.  Find  out  information  about  The   Beatles)             Agenda’s  can  change  in  a  session,  but  it  is  likely  that  queries  issued  within  a  short  period  of  time   will  have  a  similar  goal.             Conclusion:             Facet  meta-­‐information  from  facets  can  be  used  to  associate  similar  things  or  concepts  within  a   search  session.
  • 21. 21 Building  a  Suggester  with  Dynamic  Context     Suggester  Builder  Design  (Fusion  Connector) Uses  Facet  Queries  against  a  Content   Collection  to  create  additional   metadata  for  the  Suggester  or   Typeahead  Collection.         This  contextual  metadata  can  then   be  used  for:             •  Security  Trimming  of  Typeahead   suggestions             •  Dynamic  boosting  of  similar   suggestions  within  a  user  session
  • 22. 22 Building  a  Suggester  with  Dynamic  Context     Bring  back  other  fields  in  addition  to  displayed  suggestion  text     (i.e.,  the  ones  that  were  calculated  using  faceting)         If  a  query  is  used  to  search,  temporarily  store  its  associated  metadata  in  a  circular  cache  on  the  browser.             When  submitting  the  next  typeahead  query,  add  the  cached  information  from  the  queue  as  boost   queries. Type  ‘j’  -­‐  get  back   Jai  Johnny  Johanson  Bands   Jai  Johnny  Johanson  Groups   J.J.  Johnson   Jai  Johnny  Johanson   Juke  Joint  Jezebel   Juke  Joint  Jimmy Just  searched  for  ‘Paul  McCartney’  then  type  ‘j’   John  Lennon   John  Lennon  Songs   John  Lennon  Songs  Covered   James  P  Johnson  Songs  (?)   John  Lennon  Originals   Hey  Jude
  • 23. 23 Building  a  Suggester  with  Dynamic  Context     Paul  McCartney’s  “Meta-­‐informational  Context”: genres_ss:                                            Rock,  Rock  &  Roll,  Soft  Rock,  Pop  Rock   hasPerformer_ss:                  Beatles,  Paul  McCartney,  José  Feliciano,  Jimi  Hendrix,  Joe  Cocker,                                                                                    Aretha  Franklin,  Bon  Jovi,  Elvis  Presley  (  …  and  many  more)   composer_ss:                                Paul  McCartney,  John  Lennon,  Ringo  Starr,  George  Harrison,                                                                                    George  Jackson,  Michael  Jackson,  Sonny  Bono   memberOfGroup_ss:      Beatles,  Wings Dynamic  Boost  Query: genres_ss:”Rock”^50  genres_ss:”Rock  &  Roll”^50  genres_ss:”Soft  Rock”^50  genres_ss:”Pop  Rock”^50     hasPerformer_ss:”Beatles”^50  hasPerformer_ss:”Paul  McCartney”^50  hasPerformer_ss:”José   Feliciano”^50  hasPerformer_ss:”Jimi  Hendrix”^50     composer_ss:”Paul  McCartney”^50  composer_ss:”John  Lennon”^50  composer_ss:”Ringo  Starr”^50   composer_ss:”George  Harrison”^50     memberOfGroup_ss:”Beatles”^50  memberOfGroup_ss:”Wings”^50
  • 24. 24 Text  Mining  Analyses Problem:    Metadata  needs  to  be  improved  for  useful  application  of  QAF  (i.e.  Real  World)           Case  1:               Extracting  product  type  and  product  attributes  metadata  from  short  product  descriptions  in  eCommerce   data  -­‐  dealing  with  precision  and  recall           Case  2:             Large  text  documents.    Want  to  extract  keywords  and  assign  categories  to  documents.   Interesting  properties  of  facets  when  directed  towards  unstructured  text:   Facet  ratios  of  positive  and  negative  queries  yield  “keyword  clusters”   Document  clustering  of  keyword  cluster  vectors  give  crisp  categories
  • 25. 25 Auto  phrasing  vs.  Auto  filtering Auto  Phrasing     -­‐Multi-­‐term  phrases  that  refer  to  a  single  entity.   -­‐Used  as  a  workaround  to  Solr  “Multi-­‐term  synonym  problem”   -­‐That  is  now  fixed  (as  of  6.4.1  -­‐  thanks  Steve  Rowe!)   -­‐Is  Auto  phrasing  solution  now  obsolete?   -­‐Answer:  NOT!!!,  that  was  exploiting  a  side  effect  of  what  it  does!   -­‐  Uses  knowledge  from  a  phrase  list  to  determine  what  is  an  auto  phrase   -­‐Works  on  tokenized  text  fields  (implemented  as  a  Lucene  TokenFilter)             Query  Auto  Filtering     -­‐  Utilizes  information  from  non-­‐tokenized  text  fields  -­‐  inherently  solves  multi-­‐term  problem             Strategy  for  “unstructured  text”:             Use  auto  phrasing  to  extract  phrase  metadata  (  keywords  )  from  unstructured  text   This  metadata  can  then  be  consumed  by  Query  Autofilter  at  search  time.
  • 26. 26 Simple  Keyword  Analysis “Unstructured” Text Lucene Analyzer with Auto Phrasing Extensions Spark Job Metadata  we  would  like  to  have  but  don’t  have  -­‐  requires  lots  of  manual  curation  ==  $$$             Have  short  descriptive  text  fields  that  can  be  mined  to  glean  useful  metadata  such  as  product  type,  material,   size. Special  Sauce  Ingredients:             ➡Semantically  pure  lexicons  (things,  brands,  attributes,  dimensions,  logos,  materials)  of  key  terms           ➡Auto  phrasing-­‐based  Lucene  Analysis  to  extract  key  terms  and  “stop  phrases”  (e.g.  Mr  Coffee)             ➡Expansions  and  Relations  based  on  noun  phrases  in  lexicon.    Contextually  aware  management  of   precision  and  recall.         ➡Tricks  to  deal  with  “leather  case  for  iPhone”,  “DSLR  camera  with  50-­‐mm  lens”
  • 27. 27 Expansions  and  Relations Motivation:  eCommerce  Use  Case:   Search  for  ‘iPhone’  -­‐  get  iPhone  cases  and  iPhone  chargers  mixed  in.    -­‐     Want  to  have  both  BUT  want  iPhones  at  the  TOP  of  the  result  set.   =>  TF/IDF  doesn’t  always  deliver  on  this  (can’t  control  relevance  -­‐  you  get  what  you  get)   i.e.  -­‐  want  recall  for  up  sell  opportunities  -­‐  so  relax  precision  a  bit.   Relevance  (what  I  want  is  on  top)  is  still  very  important   Search  for  ‘iPhone  case’   Now  I  want  precision  -­‐  just  show  me  iPhone  cases  please  ‘cause  I  already  got  a  stinkin’  iPhone!!       Why  else  would  I  be  looking  for  accessories  for  it  ???
  • 28. 28 Expansions  and  Relations Noun  phrases  have  structure:           end  table   side  table   dining  room  table   picnic  table   coffee  table   folding  table                                     =>  Are  ALL  types  of  tables         table  cloth   table  setting   table  lamp   table  chair             =>  Are  table  related  things.   Expansions  -­‐  IS-­‐A  relationships             Phrases  that  end  in  ‘table’  are  specific  types  of  tables   classify  ‘end  table’  as  ‘table’  too             =>  search  for  ‘table’  returns  all  types  of  tables         =>  search  for  ‘end  table’  just  returns  end  tables             Relations  -­‐  IS-­‐LIKE  Relationships         Phrases  that  start  with  ‘table’  are  table  related  things             Add  table  related  things  to  fq  for  ‘table’  as  OR  list             Boost  search  term  ‘table’  more  than     table  related  things  -­‐  get  both  but  tables  are  first             Table  related  things  don’t  have  relations     -­‐  search  is  more  specific  -­‐  just  get  that  thing!
  • 29. 29 Unstructured  Text  -­‐  Oh  My! The  problem  of  unstructured  text  is  that  it  is  …  well  unstructured  ….  or  is  it?  (Linguists  don’t  think  so!)             We  search  but  don’t  typically  facet  on  unstructured  text  fields  (i.e.  tokenized  fields).          Even  though  in  Solr  we  can  facet  on  anything   -­‐  Get  all  of  the  tokenized  terms  and  their  counts  as  facet  values  -­‐>  very  high  cardinality   -­‐  Absolutely  useless  for  UI  drill  in  -­‐  so  this  is  basically  a  no-­‐no  at  query  time             =>  But  that  is  not  all  that  facets  are  good  for  so  …  wait  a  minute  (light-­‐bulb  moment)!  <=             What  if  we  DID  facet  on  the  tokens  and  used  their  stats  to  do  some  text  analysis?             =>  It  turns  out  we  can  use  facets  to  detect  keywords  in  documents.    <=                 Keywords  -­‐  Terms  that  occur  in  relatively  few  documents  (but  not  too  few).                                          -­‐  Tend  to  be  important  words  in  some  subjects  but  not  others                                          -­‐  i.e.  their  usage  is  highly  contextual  to  a  subject!             Keywords  for  the  same  subject  area  tend  to  occur  together  because  they  share  the  same  context!       Facets  are  a  great  context  mining  tool!!       Sounds  like  a  FIT!
  • 30. 30 Facet  Ratios  =>  Keyword  Clustering Method  to  my  Madness:           •    Tokenize  text  with  auto  phrasing,  stop  words  and  synonyms      -­‐  store  tokens  in  a  multi-­‐valued  field  with  DocValues      -­‐  (yes  you  can  facet  on  a  text  field  but  it  tends  to  hit  a  wall  -­‐  2M  word  limit  on  facet  values)               •    Using  the  /terms  handler,  get  each  term  in  the  text  field.         •    Submit  two  queries     -­‐  one  with  text_field:[term]    (positive  Q)   -­‐  one  with  -­‐text_field:[term]  (negative  Q)             •    Calculate  the  following  ratio:   •    Take  the  xlog(x)  of  this  ratio  (for  better  discrimination)   -­‐for  each  term,  take  the  best  related  terms  above  some  threshold    Facet  counts  (posizve  Q)   ————————————      Total  counts  (posizve  Q)   ———————————————      Facet  counts  (negazve  Q)          ————————————-­‐                Total  counts  (negazve  Q)
  • 31. 31 Facet  Ratios  =>  Keyword  Clusters Authentication 1002.7227722772277  firewall   561.5247524752475  authorization   401.08910891089107  passwords   374.34983498349834  plugging   160.43564356435644  transport   88.81258840169731  tied   80.21782178217822  weblogic,bootstrap,computationally,                                                                              finely,usernames   56.152475247524755  ssl   40.10891089108911  login,augments,dialog,encapsulated,                                                                              fallback,privileged   34.87731381833836  kerberos   32.087128712871284  permission   28.64922206506365  password   26.739273927392738  acls,realm,conversely   25.523852385238524  streaming   23.874351720886374  enabling   23.174037403740375  remote 22.91937765205092  protocols     20.054455445544555  globally,bind,indirectly,redirects   18.51180502665651  ldap   16.043564356435642  proxy   14.585058505850585  memberships   14.465508845966564  permissions   13.369636963696369  protect   12.534034653465346  grained   11.918076379066479  linux   11.523002023959302  advice   11.45968882602546  authenticate   11.23049504950495  hash   11.184215536938309  message   10.106182271770484  plugins   10.027227722772277  header,controlled,crafting   9.43739079790332  username   8.913091309130913  logins  
  • 32. 32 Facet  Ratios  =>  Keyword  Clusters Phase  II  -­‐  check  related  terms  for  correlation  (count  agreements  in  their  related  terms)   8984.39603960396  authorization   4411.980198019802  passwords   4010.891089108911  firewall   802.1782178217823  usernames   601.6336633663367  password   508.046204620462  realm   505.3722772277228  ssl   481.3069306930693  login   425.7715156130997  ldap   418.52776582006027  kerberos   320.8712871287129  bootstrap,finely   256.69702970297027  permission   216.98263268949847  permissions   213.9141914191419  acls   185.392299229923  remote 167.1204620462046  enabling   153.14311431143113  streaming   132.12347117064647  username   120.32673267326733  bind   106.95709570957095  conversely   53.478547854785475  logins   46.09200809583721  advice   39.269812336882225  requests   32.52073856034252  zookeeper   31.13477101134771  plugin   28.422165293452423  admin   26.739273927392738  bother   25.465975168945466  controls   24.68240670220868  native   22.716551301147813  require
  • 33. 33 Keyword  Vector  Document  Clustering Use  the  Keyword  Vectors  to  compute  distances  between  documents  rather  than  raw  TF/IDF   =>  Higher  Signal  To  Noise     Tokenizer Compute Keyword Vector K-Means Clustering Cluster: 98 stump_the_chump: 15159.853372701356 stump: 12931.059994928455 prize: 12378.463050783357 sight: 2943.012345679012 tough: 2872.8905092427412 question: 2827.6045026881716 judge: 2353.9344100731737 submit: 2250.350305525309 session: 2147.8922671532514 panel: 1888.958487954128 hostetter: 1722.9000585471174 grant: 1600.741568627451 chump: 1558.9513516128222 lucene_revolution: 1353.774672198919 spot: 1211.5869933577087 award: 1048.082490095137 mock: 1005.0931680939833 conference: 903.0025141117053 muir: 878.7673037468955 seat: 870.9154155915239 hot: 799.5070748299321 Get a list of documents for each cluster - label the clusters ==> Document Category
  • 34. 34 Keyword  Vector  Document  Clustering Cluster: 85 young_generation: 58393.71450722004 throughput_collector: 51879.272543859726 tenured_space: 45769.06697989158 young_generation_collector: 36786.321736596736 tenure: 33389.738840692735 stop_the_world: 31288.96145142277 concurrent_low_pause_collector: 29612.759802867382 useadaptivesizepolicy: 26927.686004351213 useparnewgc: 26819.583333333332 useparalleloldgc: 25450.34188034188 jvm: 25354.346667094775 young_space: 22546.65166222556 useparallelgc: 22168.126984126982 collector: 21226.31425547997 survivor_space: 20836.967617437283 heap: 18883.16487771459 garbage_collection: 18247.692641501046 garbage_collector: 17929.789619546355 command_line_options: 10111.764705882353 sweep: 9803.141574757969