SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Building	
  a	
  Real-­‐-me,	
  Solr-­‐powered	
  
             Recommenda-on	
  Engine	
  

                                  Trey	
  Grainger	
  
                  Manager,	
  Search	
  Technology	
  Development	
  
                                                          @	
  



Lucene	
  Revolu-on	
  2012	
  	
  -­‐	
  	
  Boston	
  	
  	
  
Overview	
  
•  Overview	
  of	
  Search	
  &	
  Matching	
  Concepts	
  
•  Recommenda@on	
  Approaches	
  in	
  Solr:	
  
    •  ACribute-­‐based	
  
    •  Hierarchical	
  Classifica@on	
  
    •  Concept-­‐based	
  
    •  More-­‐like-­‐this	
  
    •  Collabora@ve	
  Filtering	
  
    •  Hybrid	
  Approaches	
  
•  Important	
  Considera@ons	
  	
  &	
  Advanced	
  	
  Capabili@es	
  
   @	
  CareerBuilder	
  
My	
  Background	
  
Trey	
  Grainger	
  
     •  Manager,	
  Search	
  Technology	
  Development	
  
          	
  @	
  CareerBuilder.com	
  
     	
  
Relevant	
  Background	
  
     •  Search	
  &	
  Recommenda@ons	
  
     •  High-­‐volume,	
  N-­‐@er	
  Architectures	
  
     •  NLP,	
  Relevancy	
  Tuning,	
  user	
  group	
  tes@ng,	
  &	
  machine	
  learning	
  

Fun	
  Side	
  Projects	
  
     •  Founder	
  and	
  Chief	
  Engineer	
  @	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .com

     •  Currently	
  co-­‐authoring	
  	
  Solr	
  in	
  Ac*on	
  book…	
  keep	
  your	
  eyes	
  out	
  for	
  
        the	
  early	
  access	
  release	
  from	
  Manning	
  Publica@ons	
  
About	
  Search	
  @CareerBuilder	
  
•  Over	
  1	
  million	
  new	
  jobs	
  each	
  month	
  	
  
•  Over	
  45	
  million	
  ac@vely	
  searchable	
  resumes	
  
•  ~250	
  globally	
  distributed	
  search	
  servers	
  (in	
  
   the	
  U.S.,	
  Europe,	
  &	
  Asia)	
  	
  
•  Thousands	
  of	
  unique,	
  dynamically	
  generated	
  
   indexes	
  
•  Hundreds	
  of	
  millions	
  of	
  search	
  documents	
  
•  Over	
  1	
  million	
  searches	
  an	
  hour	
  
Search	
  Products	
  @	
  	
  
Redefining	
  “Search	
  Engine”	
  
•  “Lucene	
  is	
  a	
  high-­‐performance,	
  full-­‐featured	
  
   text	
  search	
  engine	
  library…”	
  
  Yes,	
  but	
  really…	
  

•  	
  Lucene	
  is	
  a	
  high-­‐performance,	
  fully-­‐featured	
  
   token	
  matching	
  and	
  scoring	
  library…	
  which	
  
   can	
  perform	
  full-­‐text	
  searching.	
  
Redefining	
  “Search	
  Engine”	
  

 or,	
  in	
  machine	
  learning	
  speak:	
  

•  A	
  Lucene	
  index	
  is	
  a	
  mul@-­‐dimensional	
  	
  
   sparse	
  matrix…	
  with	
  very	
  fast	
  and	
  powerful	
  
   lookup	
  capabili@es.	
  

•  Think	
  of	
  each	
  field	
  as	
  a	
  matrix	
  containing	
  each	
  
   term	
  mapped	
  to	
  each	
  document	
  
The	
  Lucene	
  Inverted	
  Index	
  	
  
                      (tradi@onal	
  text	
  example)	
  
                                                                        How	
  the	
  content	
  is	
  INDEXED	
  into	
  
What	
  you	
  SEND	
  to	
  Lucene/Solr:	
                             Lucene/Solr	
  (conceptually):	
  

Document	
           Content	
  Field	
                                  Term	
                   Documents	
  
doc1	
  	
           once	
  upon	
  a	
  @me,	
  in	
  a	
  land	
      a	
                      doc1	
  [2x]	
  
                     far,	
  far	
  away	
                               brown	
                  doc3	
  [1x]	
  ,	
  doc5	
  [1x]	
  
doc2	
               the	
  cow	
  jumped	
  over	
  the	
               cat	
                    doc4	
  [1x]	
  
                     moon.	
  
                                                                         cow	
                    doc2	
  [1x]	
  ,	
  doc5	
  [1x]	
  
doc3	
  	
           the	
  quick	
  brown	
  fox	
  
                     jumped	
  over	
  the	
  lazy	
  dog.	
             …	
                      ...	
  


doc4	
               the	
  cat	
  in	
  the	
  hat	
                    once	
                   doc1	
  [1x],	
  doc5	
  [1x]	
  

doc5	
               The	
  brown	
  cow	
  said	
  “moo”	
              over	
                   doc2	
  [1x],	
  doc3	
  [1x]	
  
                     once.	
                                             the	
                    doc2	
  [2x],	
  doc3	
  [2x],	
  
                                                                                                  doc4[2x],	
  doc5	
  [1x]	
  
…	
                  …	
  
                                                                         …	
                      …	
  
Match	
  Text	
  Queries	
  to	
  Text	
  Fields	
  
                                	
  
         /solr/select/?q=jobcontent:	
  (soiware	
  engineer)	
  

Job	
  Content	
  Field	
   Documents	
                        engineer	
  
…	
                      …	
                             doc5	
  
engineer	
               doc1,	
  doc3,	
  doc4,	
  
                         doc5	
  
                                                       soWware	
  engineer	
  
…	
  
                                                          doc1	
  	
  	
  	
  	
  doc3	
  	
  	
  	
  
mechanical	
             doc2,	
  doc4,	
  doc6	
         	
  	
  	
  	
  	
  	
  	
  doc4	
  
…	
                      …	
  
soiware	
                doc1,	
  doc3,	
  doc4,	
  
                         doc7,	
  doc8	
                       soWware	
  
…	
                      …	
                                   doc7	
  	
  	
  	
  	
  doc8	
  
Beyond	
  Text	
  Searching	
  
•  Lucene/Solr	
  is	
  a	
  text	
  search	
  matching	
  engine	
  

•  When	
  Lucene/Solr	
  search	
  text,	
  they	
  are	
  matching	
  
   tokens	
  in	
  the	
  query	
  with	
  tokens	
  in	
  index	
  

•  Anything	
  that	
  can	
  be	
  searched	
  upon	
  can	
  form	
  the	
  
   basis	
  of	
  matching	
  and	
  scoring:	
  
    –  text,	
  aCributes,	
  loca@ons,	
  results	
  of	
  func@ons,	
  user	
  
       behavior,	
  classifica@ons,	
  etc.	
  	
  
Business	
  Case	
  for	
  Recommenda@ons	
  

•  For	
  companies	
  like	
  CareerBuilder,	
  recommenda@ons	
  
     can	
  provide	
  as	
  much	
  or	
  even	
  greater	
  business	
  value	
  
     (i.e.	
  views,	
  sales,	
  job	
  applica@ons)	
  than	
  user-­‐driven	
  
     search	
  capabili@es.	
  
	
  
•  Recommenda@ons	
  create	
  s@ckiness	
  to	
  pull	
  users	
  
     back	
  to	
  your	
  company’s	
  website,	
  app,	
  etc.	
  
	
  
•  What	
  are	
  recommenda@ons?	
  
         	
  …	
  searches	
  of	
  relevant	
  content	
  for	
  a	
  user	
  
Approaches	
  to	
  Recommenda@ons	
  
•  Content-­‐based	
  
     –  ACribute	
  based	
  
           •  i.e.	
  income	
  level,	
  hobbies,	
  loca@on,	
  experience	
  
     –  Hierarchical	
  
           •  i.e.	
  “medical//nursing//oncology”,	
  “animal//dog//terrier”	
  
     –  Textual	
  Similarity	
  
           •  i.e.	
  Solr’s	
  MoreLikeThis	
  Request	
  Handler	
  &	
  Search	
  Handler	
  
     –  Concept	
  Based	
  
           •  i.e.	
  Solr	
  =>	
  “soiware	
  engineer”,	
  “java”,	
  “search”,	
  “open	
  source”	
  


•  Behavioral	
  Based	
  	
  
           •  Collabora@ve	
  Filtering:	
  	
  “Users	
  who	
  liked	
  that	
  also	
  liked	
  this…”	
  

•  Hybrid	
  Approaches	
  
Content-­‐based	
  Recommenda@on	
  Approaches	
  
ACribute-­‐based	
  Recommenda@ons	
  
•  Example:	
  Match	
  User	
  ACributes	
  to	
  Item	
  ACribute	
  Fields	
  
     Janes_Profile:{	
  
           	
  Industry:”healthcare”,	
  	
  
           	
  Loca@ons:”Boston,	
  MA”,	
  	
  
           	
  JobTitle:”Nurse	
  Educator”,	
  	
  
           	
  Salary:{	
  min:40000,	
  max:60000	
  },	
  
     }	
  

     	
  
     /solr/select/?q=(job@tle:”nurse	
  educator”^25	
  OR	
  job@tle:
     (nurse	
  educator)^10)	
  AND	
  ((city:”Boston”	
  AND	
  
     state:”MA”)^15	
  OR	
  state:”MA”)	
  AND	
  _val_:”map(salary,
     40000,60000,10,0)”	
  
     	
  
     //by	
  mapping	
  the	
  importance	
  of	
  each	
  aCribute	
  to	
  weights	
  based	
  upon	
  
     your	
  business	
  domain,	
  you	
  can	
  easily	
  find	
  results	
  which	
  match	
  your	
  
     customer’s	
  profile	
  without	
  the	
  user	
  having	
  to	
  ini@ate	
  a	
  search.	
  
Hierarchical	
  Recommenda@ons	
  
•  Example:	
  Match	
  User	
  ACributes	
  to	
  Item	
  ACribute	
  Fields	
  
           Janes_Profile:{	
  
                 	
  MostLikelyCategory:”healthcare//nursing//oncology”,	
  	
  
                 	
  2ndMostLikelyCategory:”healthcare//nursing//transplant”,	
  	
  
                 	
  3rdMostLikelyCategory:”educator//postsecondary//nursing”,	
  …	
  
           }	
  

     	
  
    /solr/select/?q=(category:(	
  
                         (”healthcare.nursing.oncology”^40	
  	
  
                         OR	
  ”healthcare.nursing”^20	
  	
  
                         OR	
  “healthcare”^10))	
  
                         	
         	
  OR	
  	
  
                         (”healthcare.nursing.transplant”^20	
  	
  
                         OR	
  ”healthcare.nursing”^10	
  	
  
                         OR	
  “healthcare”^5))	
  
                         	
         	
  OR	
  	
  
                         (”educator.postsecondary.nursing”^10	
  	
  
                         OR	
  ”educator.postsecondary”^5	
  	
  
                         OR	
  “educator”)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ))	
  
    	
  
Textual	
  Similarity-­‐based	
  Recommenda@ons	
  
•  Solr’s	
  More	
  Like	
  This	
  Request	
  Handler	
  /	
  Search	
  Handler	
  are	
  a	
  good	
  
   example	
  of	
  this.	
  

•  Essen@ally,	
  “important	
  keywords”	
  are	
  extracted	
  from	
  one	
  or	
  more	
  
   documents	
  and	
  turned	
  into	
  a	
  search.	
  

•  This	
  results	
  in	
  secondary	
  search	
  results	
  which	
  demonstrate	
  	
  
   textual	
  similarity	
  to	
  the	
  original	
  document(s)	
  

•  See	
  hCp://wiki.apache.org/solr/MoreLikeThis	
  for	
  example	
  usage	
  

•  Currently	
  no	
  distributed	
  search	
  support	
  (but	
  a	
  patch	
  is	
  available)	
  
Concept	
  Based	
  Recommenda@ons	
  
Approaches:	
  
	
  	
  1)	
  Create	
  a	
  Taxonomy/Dic@onary	
  to	
  define	
  your	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  concepts	
  and	
  then	
  either:	
  	
  
                                      	
  a)	
  manually	
  tag	
  documents	
  as	
  they	
  come	
  in	
  
                             	
                 //Very	
  hard	
  to	
  scale…	
  see	
  Amazon	
  Mechanical	
  Turk	
  if	
  you	
  must	
  do	
  
	
  	
  	
  	
  	
  or	
  
                                    this	
  
               	
  

                             	
  b)	
  create	
  a	
  classifica@on	
  system	
  which	
  automa@cally	
  tags	
  
                             	
  	
  	
  	
  	
  	
  content	
  as	
  it	
  comes	
  in	
  (supervised	
  machine	
  learning)	
  
	
                     //See	
  Apache	
  Mahout	
  
	
  
	
  	
  2)	
  Use	
  an	
  unsupervised	
  machine	
  learning	
  algorithm	
  to	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  cluster	
  documents	
  and	
  dynamically	
  discover	
  concepts	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  (no	
  dic@onary	
  required).	
  
                      //This	
  is	
  already	
  built	
  into	
  Solr	
  using	
  Carrot2!	
  
How	
  Clustering	
  Works	
  
Sewng	
  Up	
  Clustering	
  in	
  SolrConfig.xml	
  
<searchComponent	
  name="clustering"	
  enable=“true“	
  	
  class="solr.clustering.ClusteringCompo
	
  	
  <lst	
  name="engine">	
  
	
  	
  	
  	
  <str	
  name="name">default</str>	
  
	
  	
  	
  	
  <str	
  name="carrot.algorithm">	
  
                     	
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>	
  
	
  	
  	
  	
  <str	
  name="MultilingualClustering.defaultLanguage">ENGLISH</str>	
  
	
  	
  </lst>	
  
</searchComponent>	
  
	
  	
  
<requestHandler	
  name="/clustering"	
  enable=“true"	
  class="solr.SearchHandler">	
  
	
  	
  <lst	
  name="defaults">	
  
	
  	
  	
  	
  <str	
  name="clustering.engine">default</str>	
  
	
  	
  	
  	
  <bool	
  name="clustering.results">true</bool>	
  
	
  	
  	
  	
  <str	
  name="fl">*,score</str>	
  
	
  	
  </lst>	
  
	
  	
  <arr	
  name="last-­‐components">	
  
	
  	
  	
  	
  <str>clustering</str>	
  
	
  	
  </arr>	
  
</requestHandler>	
  
Clustering	
  Search	
  in	
  Solr	
  
•  /solr/clustering/?q=content:nursing	
  
   	
  	
  	
  	
  &rows=100	
  
   	
  	
  	
  	
  &carrot.@tle=@tlefield	
  
   	
  	
  	
  	
  &carrot.snippet=@tlefield	
  	
  
   	
  	
  	
  	
  &LingoClusteringAlgorithm.desiredClusterCountBase=25	
  
   	
  	
  	
  	
  &group=false	
  //clustering	
  &	
  grouping	
  don’t	
  currently	
  play	
  nicely	
  

•  Allows	
  you	
  to	
  dynamically	
  iden@fy	
  “concepts”	
  and	
  their	
  
   prevalence	
  within	
  a	
  user’s	
  top	
  search	
  results	
  
Search:	
  	
  	
  Nursing	
  
Search:	
  	
  	
  .Net	
  
Example	
  Concept-­‐based	
  Recommenda@on	
  
      Stage	
  1:	
  Iden@fy	
  Concepts	
  
  Original	
  Query:	
  	
  	
  q=(solr	
  or	
  lucene)	
  	
  	
  	
                                                  Clusters	
  Iden@fier:	
  
  	
                                                                                                                    Developer	
  (22)	
  	
  
  	
  //	
  can	
  be	
  a	
  user’s	
  search,	
  their	
  job	
  @tle,	
  	
  a	
  list	
  of	
  skills,	
     	
     Java	
  Developer	
  (13)	
  	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                                           	
                 	
  
  //	
  or	
  any	
  other	
  keyword	
  rich	
  data	
  source	
  
                                                                                                                        Soiware	
  (10)	
  	
  
                                                                                                                        Senior	
  Java	
  Developer	
  (9)	
  	
  
                                                                                                                        Architect	
  (6)	
  	
  
                                                                                                                        Soiware	
  Engineer	
  (6)	
  	
  
                                                                                                                        Web	
  Developer	
  (5)	
  	
  
                                                                                                                        Search	
  (3)	
  	
  
                                                                                                                        Soiware	
  Developer	
  (3)	
  	
  
                                                                                                                        Systems	
  (3)	
  	
  
                                                                                                                        Administrator	
  (2)	
  	
  
Facets	
  Iden@fied	
  (occupa@on):	
                                                                                    Hadoop	
  Engineer	
  (2)	
  	
  
                                                                                                                        Java	
  J2EE	
  (2)	
  	
  
Computer	
  SoWware	
  Engineers	
                                                                                      Search	
  Development	
  (2)	
  	
  
Web	
  Developers	
                                                                                                     Soiware	
  Architect	
  (2)	
  	
  
                                                                                                                        Solu@ons	
  Architect	
  (2)	
  	
  
...	
  
Example	
  Concept-­‐based	
  Recommenda@on	
  
  Stage	
  2:	
  Run	
  Recommenda@ons	
  Search	
  
q=content:(“Developer”^22	
  or	
  “Java	
  Developer”^13	
  or	
  “Soiware	
  
”^10	
  or	
  “Senior	
  Java	
  Developer”^9	
  	
  or	
  “Architect	
  ”^6	
  or	
  “Soiware	
  
Engineer”^6	
  or	
  “Web	
  Developer	
  ”^5	
  or	
  “Search”^3	
  or	
  “Soiware	
  
Developer”^3	
  or	
  “Systems”^3	
  or	
  “Administrator”^2	
  or	
  “Hadoop	
  
Engineer”^2	
  or	
  “Java	
  J2EE”^2	
  or	
  “Search	
  Development”^2	
  or	
  
“Soiware	
  Architect”^2	
  or	
  “Solu@ons	
  Architect”^2)	
  and	
  
occupa@on:	
  (“Computer	
  SoWware	
  Engineers”	
  or	
  “Web	
  
Developers”)	
  
	
  
//	
  Your	
  can	
  also	
  add	
  the	
  user’s	
  loca-on	
  or	
  the	
  original	
  keywords	
  to	
  the	
  	
  
//	
  recommenda-ons	
  search	
  if	
  it	
  helps	
  results	
  quality	
  for	
  your	
  use-­‐case.	
  
Example	
  Concept-­‐based	
  Recommenda@on	
  
Stage	
  3:	
  Returning	
  the	
  Recommenda@ons	
  




                                                        …	
  
Important	
  Side-­‐bar:	
  Geography	
  
Geography	
  and	
  Recommenda@ons	
  
•  Filtering	
  or	
  boos@ng	
  results	
  based	
  upon	
  geographical	
  area	
  or	
  
   distance	
  can	
  help	
  greatly	
  for	
  certain	
  use	
  cases:	
  
     –  Jobs/Resumes,	
  Tickets/Concerts,	
  Restaurants	
  


•  For	
  other	
  use	
  cases,	
  loca@on	
  sensi@vity	
  is	
  nearly	
  worthless:	
  
     –  Books,	
  Songs,	
  Movies	
  
     	
  
     	
  
     	
  
     	
  

     /solr/select/?q=(Standard	
  Recommenda-on	
  Query)	
  AND	
  
     _val_:”(recip(geodist(loca@on,	
  40.7142,	
  74.0064),1,1,0))”	
  
     	
  
     	
  
     	
  
     //	
  there	
  are	
  dozens	
  of	
  well-­‐documented	
  ways	
  to	
  search/filter/sort/boost	
  	
  
     //	
  on	
  geography	
  in	
  Solr..	
  	
  This	
  is	
  just	
  one	
  example.	
  
     	
  
     	
  
     	
  
     	
  
Behavior-­‐based	
  Recommenda@on	
  Approaches	
  
            (Collabora@ve	
  Filtering)	
  
The	
  Lucene	
  Inverted	
  Index	
  	
  
                       (user	
  behavior	
  example)	
  
                                                           How	
  the	
  content	
  is	
  INDEXED	
  into	
  
What	
  you	
  SEND	
  to	
  Lucene/Solr:	
                Lucene/Solr	
  (conceptually):	
  

Document	
           “Users	
  who	
  bought	
  this	
      Term	
                   Documents	
  
                     product”	
  Field	
  
                                                            user1	
                  doc1,	
  doc5	
  
doc1	
  	
           user1,	
  user4,	
  user5	
  
                                                            user2	
                  doc2	
  
doc2	
               user2,	
  user3	
                      user3	
                  doc2	
  
                                                            user4	
                  doc1,	
  doc3,	
  	
  
doc3	
  	
           user4	
                                                         doc4,	
  doc5	
  
                     	
  
                                                            user5	
                  doc1,	
  doc4	
  
doc4	
               user4,	
  user5	
  
                     	
                                     …	
                      …	
  
doc5	
               user4,	
  user1	
  
…	
                  …	
  
Collabora@ve	
  Filtering	
  
•  Step	
  1:	
  Find	
  similar	
  users	
  who	
  like	
  the	
  same	
  documents	
  
                                                       	
  

                     q=documen@d:	
  (“doc1”	
  OR	
  “doc4”)	
  
  Document	
     “Users	
  who	
  bought	
  this	
  
                 product	
  “Field	
  
                                                                         doc1	
                                           doc4	
  
  doc1	
  	
     user1,	
  user4,	
  user5	
  
                                                                user1	
  	
  	
  	
  	
  user4	
  	
            	
  	
  	
  user4	
  	
  	
  	
  	
  user5	
  
  doc2	
         user2,	
  user3	
                              	
  	
  	
  	
  
                                                                	
  	
  	
  	
  	
  	
  	
  	
  	
  user5	
  
  doc3	
  	
     user4	
  
                 	
  
  doc4	
         user4,	
  user5	
                            Top	
  Scoring	
  Results	
  (Most	
  Similar	
  
                 	
                                           Users):	
  
  doc5	
         user4,	
  user1	
                            1)  	
  user5	
  (2	
  shared	
  likes)	
  	
  
                                                              2)  	
  user4	
  (2	
  shared	
  likes)	
  
  …	
            …	
  
                                                              3)  	
  user	
  1	
  (1	
  shared	
  like)	
  
Collabora@ve	
  Filtering	
  
 •  Step	
  2:	
  Search	
  for	
  docs	
  “liked”	
  by	
  those	
  similar	
  users	
  
	
  	
  	
  
Most	
  Similar	
  Users:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /solr/select/?q=userlikes:	
  (“user5”^2	
  	
  
1)  	
  user5	
  (2	
  shared	
  likes)	
  
2)  	
  user4	
  (2	
  shared	
  likes)	
   	
                   	
                                                                     	
                                                                                        	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OR	
  “user4”^2	
  OR	
  “user1”^1)	
  
3)  	
  user	
  1	
  (1	
  shared	
  like)	
  


 Term	
                                               Documents	
  
                                                                                                                                                                                Top	
  Recommended	
  Documents:	
  
 user1	
                                              doc1,	
  doc5	
                                                                                                           1)	
  doc1	
  (matches	
  user4,	
  user5,	
  user1)	
  
 user2	
                                              doc2	
                                                                                                                    2)	
  doc4	
  (matches	
  user4,	
  user5)	
  
                                                                                                                                                                                3)	
  doc5	
  (matches	
  user4,	
  user1)	
  
 user3	
                                              doc2	
  
                                                                                                                                                                                4)	
  doc3	
  (matches	
  user4)	
  
 user4	
                                              doc1,	
  doc3,	
  	
                                                                                                      	
  
                                                      doc4,	
  doc5	
                                                                                                           //Doc	
  2	
  does	
  not	
  match	
  
 user5	
                                              doc1,	
  doc4	
                                                                                                           //above	
  example	
  ignores	
  idf	
  calcula@ons	
  
 …	
                                                  …	
  
Lot’s	
  of	
  Varia@ons	
  
•      Users	
  –>	
  Item(s)	
  
•      User	
  –>	
  Item(s)	
  –>	
  Users	
  
•      Item	
  –>	
  Users	
  –>	
  Item(s)	
  
•      etc.	
  
                                  User	
  1	
   User	
  2	
   User	
  3	
   User	
  4	
   …	
  
                  Item	
  1	
   X	
             X	
           X	
                          …	
  
                  Item	
  2	
                   X	
                          X	
           …	
  
                  Item	
  3	
                   X	
           X	
                          …	
  
                  Item	
  4	
                                                X	
           …	
  
                  …	
             …	
           …	
           …	
            …	
           …	
  
	
  
Note:	
  Just	
  because	
  this	
  example	
  	
  tags	
  with	
  “users”	
  doesn’t	
  mean	
  you	
  have	
  to.	
  	
  
You	
  can	
  map	
  any	
  en@ty	
  to	
  any	
  other	
  related	
  en@ty	
  and	
  achieve	
  a	
  similar	
  result.	
  

	
  
Comparison	
  with	
  Mahout	
  
•  Recommenda@ons	
  are	
  much	
  easier	
  for	
  us	
  to	
  perform	
  in	
  Solr:	
  
      –    Data	
  is	
  already	
  present	
  and	
  up-­‐to-­‐date	
  
      –    Doesn’t	
  require	
  wri@ng	
  significant	
  code	
  to	
  make	
  changes	
  (just	
  changing	
  queries)	
  
      –    Recommenda@ons	
  are	
  real-­‐@me	
  as	
  opposed	
  to	
  asynchronously	
  processed	
  off-­‐line.	
  
      –    Allows	
  easy	
  u@liza@on	
  of	
  any	
  content	
  and	
  available	
  func@ons	
  to	
  boost	
  results	
  

•  Our	
  ini@al	
  tests	
  show	
  our	
  collabora@ve	
  filtering	
  approach	
  in	
  Solr	
  significantly	
  
   outperforms	
  our	
  Mahout	
  tests	
  in	
  terms	
  of	
  results	
  quality	
  
      –  Note:	
  We	
  believe	
  that	
  some	
  por@on	
  of	
  the	
  quality	
  issues	
  we	
  have	
  with	
  the	
  Mahout	
  
         implementa@on	
  have	
  to	
  do	
  with	
  staleness	
  of	
  data	
  due	
  to	
  the	
  frequency	
  with	
  which	
  our	
  data	
  is	
  
         updated.	
  

•  Our	
  general	
  take	
  away:	
  
      –  	
  We	
  believe	
  that	
  Mahout	
  might	
  be	
  able	
  to	
  return	
  beCer	
  matches	
  than	
  Solr	
  with	
  a	
  lot	
  of	
  
         custom	
  work,	
  but	
  it	
  does	
  not	
  perform	
  beCer	
  for	
  us	
  out	
  of	
  the	
  box.	
  

•  Because	
  we	
  already	
  scale…	
  
      –  Since	
  we	
  already	
  have	
  all	
  of	
  data	
  indexed	
  in	
  Solr	
  (tens	
  to	
  hundreds	
  of	
  millions	
  of	
  documents),	
  
         there’s	
  no	
  need	
  for	
  us	
  to	
  rebuild	
  a	
  sparse	
  matrix	
  in	
  Hadoop	
  (your	
  needs	
  may	
  be	
  different).	
  	
  
Hybrid	
  Recommenda@on	
  Approaches	
  
Hybrid	
  Approaches	
  
•  Not	
  much	
  to	
  say	
  here,	
  I	
  think	
  you	
  get	
  the	
  point.	
  

•  /solr/select/?q=category:(”healthcare.nursing.oncology”^10	
  
   ”healthcare.nursing”^5	
  OR	
  “healthcare”)	
  	
  OR	
  @tle:”Nurse	
  
   Educator”^15	
  AND	
  _val_:”map(salary,40000,60000,10,0)”^5	
  
   AND	
  _val_:”(recip(geodist(loca@on,	
  40.7142,	
  74.0064),
   1,1,0))”)	
  

•  Combining	
  mul@ple	
  approaches	
  generally	
  yields	
  beCer	
  overall	
  
   results	
  if	
  done	
  intelligently.	
  	
  Experimenta@on	
  is	
  key	
  here.	
  
Important	
  Considera@ons	
  &	
  	
  
 Advanced	
  Capabili@es	
  @	
  
      CareerBuilder	
  
Important	
  Considera@ons	
  @	
  
              CareerBuilder	
  

•  Payload	
  Scoring	
  
•  Measuring	
  Results	
  Quality	
  
•  Understanding	
  our	
  Users	
  
Custom	
  Scoring	
  with	
  Payloads	
  
•    In	
  addi@on	
  to	
  boos@ng	
  search	
  terms	
  and	
  fields,	
  content	
  within	
  the	
  same	
  field	
  can	
  also	
  
     be	
  boosted	
  differently	
  using	
  Payloads	
  (requires	
  a	
  custom	
  scoring	
  implementa@on):	
  
     	
  
•    Content	
  Field:	
  
                     design	
  [1]	
  /	
  engineer	
  [1]	
  /	
  really	
  [	
  ]	
  /	
  great	
  [	
  ]	
  /	
  job	
  [	
  ]	
  /	
  ten[3]	
  /	
  years[3]	
  /	
  
                     experience[3]	
  /	
  careerbuilder	
  [2]	
  /	
  design	
  [2],	
  …	
  
          	
  
          Payload	
  Bucket	
  Mappings:                                 	
  	
  
          job@tle:	
  bucket=[1]	
  boost=10;	
  company:	
  bucket=[2]	
  boost=4;	
  	
  
                   jobdescrip@on:	
  bucket=[]	
  weight=1;	
  experience:	
  bucket=[3]	
  weight=1.5	
  
          	
  
          We	
  can	
  pass	
  in	
  a	
  parameter	
  to	
  solr	
  at	
  query	
  @me	
  specifying	
  the	
  boost	
  to	
  apply	
  to	
  each	
  
          bucket	
  	
  	
  i.e.	
  	
  …&bucketWeights=1:10;2:4;3:1.5;default:1;	
  
          	
  	
  
•    This	
  allows	
  us	
  to	
  map	
  many	
  relevancy	
  buckets	
  to	
  search	
  terms	
  at	
  index	
  @me	
  and	
  adjust	
  
     the	
  weigh@ng	
  at	
  query	
  @me	
  without	
  having	
  to	
  search	
  across	
  hundreds	
  of	
  fields.	
  

•    By	
  making	
  all	
  scoring	
  parameters	
  overridable	
  at	
  query	
  @me,	
  we	
  are	
  able	
  to	
  do	
  A	
  /	
  B	
  
     tes@ng	
  to	
  consistently	
  improve	
  our	
  relevancy	
  model	
  
Measuring	
  Results	
  Quality	
  
•  A/B	
  Tes@ng	
  is	
  key	
  to	
  understanding	
  our	
  search	
  results	
  quality.	
  

•  Users	
  are	
  randomly	
  divided	
  between	
  equal	
  groups	
  

•  Each	
  group	
  experiences	
  a	
  different	
  algorithm	
  for	
  the	
  dura@on	
  of	
  
   the	
  test	
  

•  We	
  can	
  measure	
  “performance”	
  of	
  the	
  algorithm	
  based	
  upon	
  
   changes	
  in	
  user	
  behavior:	
  
      –  For	
  us,	
  more	
  job	
  applica@ons	
  =	
  more	
  relevant	
  results	
  
      –  For	
  other	
  companies,	
  that	
  might	
  translate	
  into	
  products	
  purchased,	
  addi@onal	
  
         friends	
  	
  requested,	
  or	
  non-­‐search	
  pages	
  viewed	
  	
  

•  We	
  use	
  this	
  to	
  test	
  both	
  keyword	
  search	
  results	
  and	
  also	
  
   recommenda@ons	
  quality	
  	
  
Understanding	
  our	
  Users	
  	
  
(given	
  limited	
  informa@on)	
  
Understanding	
  Our	
  Users	
  
•  Machine	
  learning	
  algorithms	
  can	
  help	
  us	
  understand	
  what	
  
   maCers	
  most	
  to	
  different	
  groups	
  of	
  users.	
  

                     	
  Example:	
  Willingness	
  to	
  relocate	
  for	
  a	
  job	
  (miles	
  per	
  percen@le)	
  
       2,500	
  

       2,000	
  
                          Title	
  Examiners,	
  Abstractors,	
  and	
  Searchers	
  
       1,500	
  
	
  
       1,000	
  
                           SoWware	
  Developers,	
  Systems	
  SoWware	
  
         500	
  
                           Food	
  Prepara-on	
  Workers	
  
             0	
  
                       1%	
   5%	
   10%	
   20%	
   25%	
   30%	
   40%	
   50%	
   60%	
   70%	
   75%	
   80%	
   90%	
   95%	
  
Key	
  Takeaways	
  
•  Recommenda@ons	
  can	
  be	
  as	
  valuable	
  or	
  more	
  
   than	
  keyword	
  search.	
  

•  If	
  your	
  data	
  fits	
  in	
  Solr	
  then	
  you	
  have	
  everything	
  
   you	
  need	
  to	
  build	
  an	
  industry-­‐leading	
  
   recommenda@on	
  system	
  

•  Even	
  a	
  single	
  keyword	
  can	
  be	
  enough	
  to	
  begin	
  
   making	
  meaningful	
  recommenda@ons.	
  	
  Build	
  up	
  
   intelligently	
  from	
  there.	
  
Contact	
  Info	
  
    §  Trey	
  Grainger	
  
                           trey.grainger@careerbuilder.com	
  
                           hep://www.careerbuilder.com	
  
                           @treygrainger	
  




And	
  yes,	
  we	
  are	
  hiring	
  –	
  come	
  chat	
  with	
  me	
  if	
  you	
  are	
  interested.	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Lucidworks
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologyLucidworks
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerLucidworks
 

Was ist angesagt? (20)

Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW Technology
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters KluwerDoing Synonyms Right - John Marquiss, Wolters Kluwer
Doing Synonyms Right - John Marquiss, Wolters Kluwer
 

Andere mochten auch

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing frameworkNitin Pande
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksLucidworks
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Building Mobile Applications with Ionic
Building Mobile Applications with IonicBuilding Mobile Applications with Ionic
Building Mobile Applications with IonicMorris Singer
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpointvbaker2210
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseLucidworks (Archived)
 
Presentation - 5 Ways an Online Recommendation Engine can Increase Sales
Presentation - 5 Ways an Online Recommendation Engine can Increase SalesPresentation - 5 Ways an Online Recommendation Engine can Increase Sales
Presentation - 5 Ways an Online Recommendation Engine can Increase SalesKriti Sarda
 
E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...Lucidworks (Archived)
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
SolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようSolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようShinsuke Sugaya
 

Andere mochten auch (20)

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing framework
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, Lucidworks
 
Types of Search Engines
Types of Search EnginesTypes of Search Engines
Types of Search Engines
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Building Mobile Applications with Ionic
Building Mobile Applications with IonicBuilding Mobile Applications with Ionic
Building Mobile Applications with Ionic
 
Search engines powerpoint
Search engines powerpointSearch engines powerpoint
Search engines powerpoint
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 
Presentation - 5 Ways an Online Recommendation Engine can Increase Sales
Presentation - 5 Ways an Online Recommendation Engine can Increase SalesPresentation - 5 Ways an Online Recommendation Engine can Increase Sales
Presentation - 5 Ways an Online Recommendation Engine can Increase Sales
 
E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...E commerce search strategies how faceted navigation and apache solr lucene op...
E commerce search strategies how faceted navigation and apache solr lucene op...
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Search engines
Search enginesSearch engines
Search engines
 
SolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようSolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみよう
 

Ähnlich wie Building a Real-time Solr-powered Recommendation Engine

Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Lucidworks
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 

Ähnlich wie Building a Real-time Solr-powered Recommendation Engine (20)

Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bp...
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Sindice warehousing meetup
Sindice warehousing meetupSindice warehousing meetup
Sindice warehousing meetup
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Kürzlich hochgeladen

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Building a Real-time Solr-powered Recommendation Engine

  • 1. Building  a  Real-­‐-me,  Solr-­‐powered   Recommenda-on  Engine   Trey  Grainger   Manager,  Search  Technology  Development   @   Lucene  Revolu-on  2012    -­‐    Boston      
  • 2. Overview   •  Overview  of  Search  &  Matching  Concepts   •  Recommenda@on  Approaches  in  Solr:   •  ACribute-­‐based   •  Hierarchical  Classifica@on   •  Concept-­‐based   •  More-­‐like-­‐this   •  Collabora@ve  Filtering   •  Hybrid  Approaches   •  Important  Considera@ons    &  Advanced    Capabili@es   @  CareerBuilder  
  • 3. My  Background   Trey  Grainger   •  Manager,  Search  Technology  Development    @  CareerBuilder.com     Relevant  Background   •  Search  &  Recommenda@ons   •  High-­‐volume,  N-­‐@er  Architectures   •  NLP,  Relevancy  Tuning,  user  group  tes@ng,  &  machine  learning   Fun  Side  Projects   •  Founder  and  Chief  Engineer  @                                                .com •  Currently  co-­‐authoring    Solr  in  Ac*on  book…  keep  your  eyes  out  for   the  early  access  release  from  Manning  Publica@ons  
  • 4. About  Search  @CareerBuilder   •  Over  1  million  new  jobs  each  month     •  Over  45  million  ac@vely  searchable  resumes   •  ~250  globally  distributed  search  servers  (in   the  U.S.,  Europe,  &  Asia)     •  Thousands  of  unique,  dynamically  generated   indexes   •  Hundreds  of  millions  of  search  documents   •  Over  1  million  searches  an  hour  
  • 6. Redefining  “Search  Engine”   •  “Lucene  is  a  high-­‐performance,  full-­‐featured   text  search  engine  library…”   Yes,  but  really…   •   Lucene  is  a  high-­‐performance,  fully-­‐featured   token  matching  and  scoring  library…  which   can  perform  full-­‐text  searching.  
  • 7. Redefining  “Search  Engine”   or,  in  machine  learning  speak:   •  A  Lucene  index  is  a  mul@-­‐dimensional     sparse  matrix…  with  very  fast  and  powerful   lookup  capabili@es.   •  Think  of  each  field  as  a  matrix  containing  each   term  mapped  to  each  document  
  • 8. The  Lucene  Inverted  Index     (tradi@onal  text  example)   How  the  content  is  INDEXED  into   What  you  SEND  to  Lucene/Solr:   Lucene/Solr  (conceptually):   Document   Content  Field   Term   Documents   doc1     once  upon  a  @me,  in  a  land   a   doc1  [2x]   far,  far  away   brown   doc3  [1x]  ,  doc5  [1x]   doc2   the  cow  jumped  over  the   cat   doc4  [1x]   moon.   cow   doc2  [1x]  ,  doc5  [1x]   doc3     the  quick  brown  fox   jumped  over  the  lazy  dog.   …   ...   doc4   the  cat  in  the  hat   once   doc1  [1x],  doc5  [1x]   doc5   The  brown  cow  said  “moo”   over   doc2  [1x],  doc3  [1x]   once.   the   doc2  [2x],  doc3  [2x],   doc4[2x],  doc5  [1x]   …   …   …   …  
  • 9. Match  Text  Queries  to  Text  Fields     /solr/select/?q=jobcontent:  (soiware  engineer)   Job  Content  Field   Documents   engineer   …   …   doc5   engineer   doc1,  doc3,  doc4,   doc5   soWware  engineer   …   doc1          doc3         mechanical   doc2,  doc4,  doc6                doc4   …   …   soiware   doc1,  doc3,  doc4,   doc7,  doc8   soWware   …   …   doc7          doc8  
  • 10. Beyond  Text  Searching   •  Lucene/Solr  is  a  text  search  matching  engine   •  When  Lucene/Solr  search  text,  they  are  matching   tokens  in  the  query  with  tokens  in  index   •  Anything  that  can  be  searched  upon  can  form  the   basis  of  matching  and  scoring:   –  text,  aCributes,  loca@ons,  results  of  func@ons,  user   behavior,  classifica@ons,  etc.    
  • 11. Business  Case  for  Recommenda@ons   •  For  companies  like  CareerBuilder,  recommenda@ons   can  provide  as  much  or  even  greater  business  value   (i.e.  views,  sales,  job  applica@ons)  than  user-­‐driven   search  capabili@es.     •  Recommenda@ons  create  s@ckiness  to  pull  users   back  to  your  company’s  website,  app,  etc.     •  What  are  recommenda@ons?    …  searches  of  relevant  content  for  a  user  
  • 12. Approaches  to  Recommenda@ons   •  Content-­‐based   –  ACribute  based   •  i.e.  income  level,  hobbies,  loca@on,  experience   –  Hierarchical   •  i.e.  “medical//nursing//oncology”,  “animal//dog//terrier”   –  Textual  Similarity   •  i.e.  Solr’s  MoreLikeThis  Request  Handler  &  Search  Handler   –  Concept  Based   •  i.e.  Solr  =>  “soiware  engineer”,  “java”,  “search”,  “open  source”   •  Behavioral  Based     •  Collabora@ve  Filtering:    “Users  who  liked  that  also  liked  this…”   •  Hybrid  Approaches  
  • 14. ACribute-­‐based  Recommenda@ons   •  Example:  Match  User  ACributes  to  Item  ACribute  Fields   Janes_Profile:{    Industry:”healthcare”,      Loca@ons:”Boston,  MA”,      JobTitle:”Nurse  Educator”,      Salary:{  min:40000,  max:60000  },   }     /solr/select/?q=(job@tle:”nurse  educator”^25  OR  job@tle: (nurse  educator)^10)  AND  ((city:”Boston”  AND   state:”MA”)^15  OR  state:”MA”)  AND  _val_:”map(salary, 40000,60000,10,0)”     //by  mapping  the  importance  of  each  aCribute  to  weights  based  upon   your  business  domain,  you  can  easily  find  results  which  match  your   customer’s  profile  without  the  user  having  to  ini@ate  a  search.  
  • 15. Hierarchical  Recommenda@ons   •  Example:  Match  User  ACributes  to  Item  ACribute  Fields   Janes_Profile:{    MostLikelyCategory:”healthcare//nursing//oncology”,      2ndMostLikelyCategory:”healthcare//nursing//transplant”,      3rdMostLikelyCategory:”educator//postsecondary//nursing”,  …   }     /solr/select/?q=(category:(   (”healthcare.nursing.oncology”^40     OR  ”healthcare.nursing”^20     OR  “healthcare”^10))      OR     (”healthcare.nursing.transplant”^20     OR  ”healthcare.nursing”^10     OR  “healthcare”^5))      OR     (”educator.postsecondary.nursing”^10     OR  ”educator.postsecondary”^5     OR  “educator”)                                                                                          ))    
  • 16. Textual  Similarity-­‐based  Recommenda@ons   •  Solr’s  More  Like  This  Request  Handler  /  Search  Handler  are  a  good   example  of  this.   •  Essen@ally,  “important  keywords”  are  extracted  from  one  or  more   documents  and  turned  into  a  search.   •  This  results  in  secondary  search  results  which  demonstrate     textual  similarity  to  the  original  document(s)   •  See  hCp://wiki.apache.org/solr/MoreLikeThis  for  example  usage   •  Currently  no  distributed  search  support  (but  a  patch  is  available)  
  • 17. Concept  Based  Recommenda@ons   Approaches:      1)  Create  a  Taxonomy/Dic@onary  to  define  your                        concepts  and  then  either:      a)  manually  tag  documents  as  they  come  in     //Very  hard  to  scale…  see  Amazon  Mechanical  Turk  if  you  must  do            or   this      b)  create  a  classifica@on  system  which  automa@cally  tags              content  as  it  comes  in  (supervised  machine  learning)     //See  Apache  Mahout        2)  Use  an  unsupervised  machine  learning  algorithm  to                      cluster  documents  and  dynamically  discover  concepts                        (no  dic@onary  required).   //This  is  already  built  into  Solr  using  Carrot2!  
  • 19. Sewng  Up  Clustering  in  SolrConfig.xml   <searchComponent  name="clustering"  enable=“true“    class="solr.clustering.ClusteringCompo    <lst  name="engine">          <str  name="name">default</str>          <str  name="carrot.algorithm">    org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>          <str  name="MultilingualClustering.defaultLanguage">ENGLISH</str>      </lst>   </searchComponent>       <requestHandler  name="/clustering"  enable=“true"  class="solr.SearchHandler">      <lst  name="defaults">          <str  name="clustering.engine">default</str>          <bool  name="clustering.results">true</bool>          <str  name="fl">*,score</str>      </lst>      <arr  name="last-­‐components">          <str>clustering</str>      </arr>   </requestHandler>  
  • 20. Clustering  Search  in  Solr   •  /solr/clustering/?q=content:nursing          &rows=100          &carrot.@tle=@tlefield          &carrot.snippet=@tlefield            &LingoClusteringAlgorithm.desiredClusterCountBase=25          &group=false  //clustering  &  grouping  don’t  currently  play  nicely   •  Allows  you  to  dynamically  iden@fy  “concepts”  and  their   prevalence  within  a  user’s  top  search  results  
  • 21. Search:      Nursing  
  • 22. Search:      .Net  
  • 23. Example  Concept-­‐based  Recommenda@on   Stage  1:  Iden@fy  Concepts   Original  Query:      q=(solr  or  lucene)         Clusters  Iden@fier:     Developer  (22)      //  can  be  a  user’s  search,  their  job  @tle,    a  list  of  skills,     Java  Developer  (13)                                                 //  or  any  other  keyword  rich  data  source   Soiware  (10)     Senior  Java  Developer  (9)     Architect  (6)     Soiware  Engineer  (6)     Web  Developer  (5)     Search  (3)     Soiware  Developer  (3)     Systems  (3)     Administrator  (2)     Facets  Iden@fied  (occupa@on):   Hadoop  Engineer  (2)     Java  J2EE  (2)     Computer  SoWware  Engineers   Search  Development  (2)     Web  Developers   Soiware  Architect  (2)     Solu@ons  Architect  (2)     ...  
  • 24. Example  Concept-­‐based  Recommenda@on   Stage  2:  Run  Recommenda@ons  Search   q=content:(“Developer”^22  or  “Java  Developer”^13  or  “Soiware   ”^10  or  “Senior  Java  Developer”^9    or  “Architect  ”^6  or  “Soiware   Engineer”^6  or  “Web  Developer  ”^5  or  “Search”^3  or  “Soiware   Developer”^3  or  “Systems”^3  or  “Administrator”^2  or  “Hadoop   Engineer”^2  or  “Java  J2EE”^2  or  “Search  Development”^2  or   “Soiware  Architect”^2  or  “Solu@ons  Architect”^2)  and   occupa@on:  (“Computer  SoWware  Engineers”  or  “Web   Developers”)     //  Your  can  also  add  the  user’s  loca-on  or  the  original  keywords  to  the     //  recommenda-ons  search  if  it  helps  results  quality  for  your  use-­‐case.  
  • 25. Example  Concept-­‐based  Recommenda@on   Stage  3:  Returning  the  Recommenda@ons   …  
  • 27. Geography  and  Recommenda@ons   •  Filtering  or  boos@ng  results  based  upon  geographical  area  or   distance  can  help  greatly  for  certain  use  cases:   –  Jobs/Resumes,  Tickets/Concerts,  Restaurants   •  For  other  use  cases,  loca@on  sensi@vity  is  nearly  worthless:   –  Books,  Songs,  Movies           /solr/select/?q=(Standard  Recommenda-on  Query)  AND   _val_:”(recip(geodist(loca@on,  40.7142,  74.0064),1,1,0))”         //  there  are  dozens  of  well-­‐documented  ways  to  search/filter/sort/boost     //  on  geography  in  Solr..    This  is  just  one  example.          
  • 28. Behavior-­‐based  Recommenda@on  Approaches   (Collabora@ve  Filtering)  
  • 29. The  Lucene  Inverted  Index     (user  behavior  example)   How  the  content  is  INDEXED  into   What  you  SEND  to  Lucene/Solr:   Lucene/Solr  (conceptually):   Document   “Users  who  bought  this   Term   Documents   product”  Field   user1   doc1,  doc5   doc1     user1,  user4,  user5   user2   doc2   doc2   user2,  user3   user3   doc2   user4   doc1,  doc3,     doc3     user4   doc4,  doc5     user5   doc1,  doc4   doc4   user4,  user5     …   …   doc5   user4,  user1   …   …  
  • 30. Collabora@ve  Filtering   •  Step  1:  Find  similar  users  who  like  the  same  documents     q=documen@d:  (“doc1”  OR  “doc4”)   Document   “Users  who  bought  this   product  “Field   doc1   doc4   doc1     user1,  user4,  user5   user1          user4          user4          user5   doc2   user2,  user3                            user5   doc3     user4     doc4   user4,  user5   Top  Scoring  Results  (Most  Similar     Users):   doc5   user4,  user1   1)   user5  (2  shared  likes)     2)   user4  (2  shared  likes)   …   …   3)   user  1  (1  shared  like)  
  • 31. Collabora@ve  Filtering   •  Step  2:  Search  for  docs  “liked”  by  those  similar  users         Most  Similar  Users:                                                                                                                /solr/select/?q=userlikes:  (“user5”^2     1)   user5  (2  shared  likes)   2)   user4  (2  shared  likes)                                  OR  “user4”^2  OR  “user1”^1)   3)   user  1  (1  shared  like)   Term   Documents   Top  Recommended  Documents:   user1   doc1,  doc5   1)  doc1  (matches  user4,  user5,  user1)   user2   doc2   2)  doc4  (matches  user4,  user5)   3)  doc5  (matches  user4,  user1)   user3   doc2   4)  doc3  (matches  user4)   user4   doc1,  doc3,       doc4,  doc5   //Doc  2  does  not  match   user5   doc1,  doc4   //above  example  ignores  idf  calcula@ons   …   …  
  • 32. Lot’s  of  Varia@ons   •  Users  –>  Item(s)   •  User  –>  Item(s)  –>  Users   •  Item  –>  Users  –>  Item(s)   •  etc.   User  1   User  2   User  3   User  4   …   Item  1   X   X   X   …   Item  2   X   X   …   Item  3   X   X   …   Item  4   X   …   …   …   …   …   …   …     Note:  Just  because  this  example    tags  with  “users”  doesn’t  mean  you  have  to.     You  can  map  any  en@ty  to  any  other  related  en@ty  and  achieve  a  similar  result.    
  • 33. Comparison  with  Mahout   •  Recommenda@ons  are  much  easier  for  us  to  perform  in  Solr:   –  Data  is  already  present  and  up-­‐to-­‐date   –  Doesn’t  require  wri@ng  significant  code  to  make  changes  (just  changing  queries)   –  Recommenda@ons  are  real-­‐@me  as  opposed  to  asynchronously  processed  off-­‐line.   –  Allows  easy  u@liza@on  of  any  content  and  available  func@ons  to  boost  results   •  Our  ini@al  tests  show  our  collabora@ve  filtering  approach  in  Solr  significantly   outperforms  our  Mahout  tests  in  terms  of  results  quality   –  Note:  We  believe  that  some  por@on  of  the  quality  issues  we  have  with  the  Mahout   implementa@on  have  to  do  with  staleness  of  data  due  to  the  frequency  with  which  our  data  is   updated.   •  Our  general  take  away:   –   We  believe  that  Mahout  might  be  able  to  return  beCer  matches  than  Solr  with  a  lot  of   custom  work,  but  it  does  not  perform  beCer  for  us  out  of  the  box.   •  Because  we  already  scale…   –  Since  we  already  have  all  of  data  indexed  in  Solr  (tens  to  hundreds  of  millions  of  documents),   there’s  no  need  for  us  to  rebuild  a  sparse  matrix  in  Hadoop  (your  needs  may  be  different).    
  • 35. Hybrid  Approaches   •  Not  much  to  say  here,  I  think  you  get  the  point.   •  /solr/select/?q=category:(”healthcare.nursing.oncology”^10   ”healthcare.nursing”^5  OR  “healthcare”)    OR  @tle:”Nurse   Educator”^15  AND  _val_:”map(salary,40000,60000,10,0)”^5   AND  _val_:”(recip(geodist(loca@on,  40.7142,  74.0064), 1,1,0))”)   •  Combining  mul@ple  approaches  generally  yields  beCer  overall   results  if  done  intelligently.    Experimenta@on  is  key  here.  
  • 36. Important  Considera@ons  &     Advanced  Capabili@es  @   CareerBuilder  
  • 37. Important  Considera@ons  @   CareerBuilder   •  Payload  Scoring   •  Measuring  Results  Quality   •  Understanding  our  Users  
  • 38. Custom  Scoring  with  Payloads   •  In  addi@on  to  boos@ng  search  terms  and  fields,  content  within  the  same  field  can  also   be  boosted  differently  using  Payloads  (requires  a  custom  scoring  implementa@on):     •  Content  Field:   design  [1]  /  engineer  [1]  /  really  [  ]  /  great  [  ]  /  job  [  ]  /  ten[3]  /  years[3]  /   experience[3]  /  careerbuilder  [2]  /  design  [2],  …     Payload  Bucket  Mappings:     job@tle:  bucket=[1]  boost=10;  company:  bucket=[2]  boost=4;     jobdescrip@on:  bucket=[]  weight=1;  experience:  bucket=[3]  weight=1.5     We  can  pass  in  a  parameter  to  solr  at  query  @me  specifying  the  boost  to  apply  to  each   bucket      i.e.    …&bucketWeights=1:10;2:4;3:1.5;default:1;       •  This  allows  us  to  map  many  relevancy  buckets  to  search  terms  at  index  @me  and  adjust   the  weigh@ng  at  query  @me  without  having  to  search  across  hundreds  of  fields.   •  By  making  all  scoring  parameters  overridable  at  query  @me,  we  are  able  to  do  A  /  B   tes@ng  to  consistently  improve  our  relevancy  model  
  • 39. Measuring  Results  Quality   •  A/B  Tes@ng  is  key  to  understanding  our  search  results  quality.   •  Users  are  randomly  divided  between  equal  groups   •  Each  group  experiences  a  different  algorithm  for  the  dura@on  of   the  test   •  We  can  measure  “performance”  of  the  algorithm  based  upon   changes  in  user  behavior:   –  For  us,  more  job  applica@ons  =  more  relevant  results   –  For  other  companies,  that  might  translate  into  products  purchased,  addi@onal   friends    requested,  or  non-­‐search  pages  viewed     •  We  use  this  to  test  both  keyword  search  results  and  also   recommenda@ons  quality    
  • 40. Understanding  our  Users     (given  limited  informa@on)  
  • 41. Understanding  Our  Users   •  Machine  learning  algorithms  can  help  us  understand  what   maCers  most  to  different  groups  of  users.    Example:  Willingness  to  relocate  for  a  job  (miles  per  percen@le)   2,500   2,000   Title  Examiners,  Abstractors,  and  Searchers   1,500     1,000   SoWware  Developers,  Systems  SoWware   500   Food  Prepara-on  Workers   0   1%   5%   10%   20%   25%   30%   40%   50%   60%   70%   75%   80%   90%   95%  
  • 42. Key  Takeaways   •  Recommenda@ons  can  be  as  valuable  or  more   than  keyword  search.   •  If  your  data  fits  in  Solr  then  you  have  everything   you  need  to  build  an  industry-­‐leading   recommenda@on  system   •  Even  a  single  keyword  can  be  enough  to  begin   making  meaningful  recommenda@ons.    Build  up   intelligently  from  there.  
  • 43. Contact  Info   §  Trey  Grainger   trey.grainger@careerbuilder.com   hep://www.careerbuilder.com   @treygrainger   And  yes,  we  are  hiring  –  come  chat  with  me  if  you  are  interested.