SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Thinking	
  Lucene	
  	
  	
  	
  	
  	
  	
  Think	
  Lucid	
  




Enhancing	
  Discovery	
  with	
  Solr	
  and	
  
Mahout	
  




Grant	
  Ingersoll	
  
Chief	
  Scien@st	
  
Lucid	
  Imagina@on	
  


                                                                                             CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  1	
  	
  
Evolution

                                Documents
                                • Models
                                • Feature Selection




                                                                    User
                                                                    Interaction
            Content
                                                                    • Clicks
            Relationships                                           • Ratings/
            • Page Rank, etc.                                        Reviews
            • Organization                                          • Learning to
                                                                     Rank
                                                                    • Social Graph




                                    Queries
                                    • Phrases
                                    • NLP




                                                      Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  2	
  	
  
Minding the Intersection




                           Search




            Analytics                 Discovery



                                    Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  3	
  	
  
Topics	
  



l    Background	
  
       –  Apache	
  Mahout	
  
       –  Apache	
  Solr	
  and	
  Lucene	
  



l    Recommenda@ons	
  with	
  Mahout	
  
       –  Collabora@ve	
  Filtering	
  
l    Discovery	
  with	
  Solr	
  and	
  Mahout	
  


l    Discussion	
  




                                                       Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  4	
  	
  
Apache	
  Lucene	
  in	
  a	
  Nutshell	
  



l    hOp://lucene.apache.org/java	
  
l    Java	
  based	
  Applica@on	
  Programming	
  Interface	
  (API)	
  for	
  adding	
  search	
  and	
  
      indexing	
  func@onality	
  to	
  applica@ons	
  
l    Fast	
  and	
  efficient	
  scoring	
  and	
  indexing	
  algorithms	
  
l    Lots	
  of	
  contribu@ons	
  to	
  make	
  common	
  tasks	
  easier:	
  
             –  Highligh@ng,	
  spa@al,	
  Query	
  Parsers,	
  Benchmarking	
  tools,	
  etc.	
  


l    Most	
  widely	
  deployed	
  search	
  library	
  on	
  the	
  planet	
  
      	
  




                                                                            Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  5	
  	
  
Apache	
  Solr	
  in	
  a	
  Nutshell	
  



l    hOp://lucene.apache.org/solr	
  
l    Lucene-­‐based	
  Search	
  Server	
  +	
  other	
  features	
  and	
  func@onality	
  
l    Access	
  Lucene	
  over	
  HTTP:	
  
       –  Java,	
  XML,	
  Ruby,	
  Python,	
  .NET,	
  JSON,	
  PHP,	
  etc.	
  
l    Most	
  programming	
  tasks	
  in	
  Lucene	
  are	
  taken	
  care	
  of	
  in	
  Solr	
  
l    Face@ng	
  (guided	
  naviga@on,	
  filters,	
  etc.)	
  
l    Replica@on	
  and	
  distributed	
  search	
  support	
  
l    Lucene	
  Best	
  Prac@ces	
  




                                                                             Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  6	
  	
  
Apache	
  Mahout	
  in	
  a	
  Nutshell	
  

           http://dictionary.reference.com/browse/mahout

  l    An	
  Apache	
  Socware	
  Founda@on	
  project	
  to	
  create	
  
        scalable	
  machine	
  learning	
  libraries	
  under	
  the	
  Apache	
  
        Socware	
  License	
  
         –  hOp://mahout.apache.org	
  
  l    The	
  Three	
  C’s:	
  
         –  Collabora@ve	
  Filtering	
  (recommenders)	
  
         –  Clustering	
  
         –  Classifica@on	
  

  l    Others:	
  
         –  Frequent	
  Item	
  Mining	
  
         –  Primi@ve	
  collec@ons	
  
         –  Math	
  stuff	
  


                                                                  Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  7	
  	
  
Thinking	
  Lucene	
  	
  	
  	
  	
  	
  	
  Think	
  Lucid	
  




Recommenda@ons	
  with	
  Mahout	
  




                                                                                     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  8	
  	
  
Recommenders	
  



l    Collabora@ve	
  Filtering	
  (CF)	
  
       –  Provide	
  recommenda@ons	
  solely	
  based	
  on	
  preferences	
  expressed	
  between	
  
          users	
  and	
  items	
  
       –  “People	
  who	
  watched	
  this	
  also	
  watched	
  that”	
  
l    Content-­‐based	
  Recommenda@ons	
  (CBR)	
  
       –  Provide	
  recommenda@ons	
  based	
  on	
  the	
  aOributes	
  of	
  the	
  items	
  and	
  user	
  profile	
  
       –  ‘Modern	
  Family’	
  is	
  a	
  sitcom,	
  Bob	
  likes	
  sitcoms	
  	
  
              •  =>	
  Suggest	
  Modern	
  Family	
  to	
  Bob	
  

l    Mahout	
  geared	
  towards	
  CF,	
  can	
  be	
  extended	
  to	
  do	
  CBR	
  
       –  Classifica@on	
  can	
  also	
  be	
  used	
  for	
  CBR	
  
l    Aside:	
  search	
  engines	
  can	
  also	
  solve	
  these	
  problems	
  


                                                                                  Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  9	
  	
  
To	
  Rate	
  or	
  Not?	
  



l     In	
  many	
  instances,	
  user’s	
  don’t	
  provide	
  actual	
  ra@ngs	
  
        –  Clicks,	
  views,	
  etc.	
  
l     Non-­‐Boolean	
  ra@ngs	
  can	
  also	
  ocen	
  introduce	
  unnecessary	
  noise	
  
        –  Even	
  a	
  low	
  ra@ng	
  ocen	
  has	
  a	
  posi@ve	
  correla@on	
  with	
  highly	
  rated	
  items	
  in	
  the	
  
           real	
  world	
  
l     Example:	
  	
  Should	
  we	
  recommend	
  Frankenstein	
  to	
  Bob?	
  

           Dracula
              Dracula Jane                             Frankenstein
                                                     Jane Eyre                      Java Programming
                                                                                     Frankenstein
                      Eyre
      Bob     1                                      4                                  ???
      Bob 1           4                                  ???                        -
      Mary    5                                      1                                  4
      Mary 5          1                                  4                          -



                                                                                 Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  10	
  	
  
Collabora;ve	
  Filtering	
  with	
  Mahout	
  


                                                                          Item         Item …                   Item m
  l    Extensive	
  framework	
  for	
  collabora@ve	
  
                                                                          1            2
        filtering	
  
                                                            User 1        -            0.5                      0.9
  l    Recommenders	
  
         –  User	
  based	
                                 User 2        0.1          0.3                      -
         –  Item	
  based	
                                 …
         –  Slope	
  One	
  
                                                            User n        0.8          0.7                      0.1
  l    Online	
  and	
  Offline	
  support	
  
         –  Offline	
  can	
  u@lize	
  Hadoop	
  




                                                                           Recommendations
                                                                              for User X

                                                             Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  11	
  	
  
User	
  Similarity	
  


                                    What	
  should	
  we	
  recommend	
  for	
  User	
  1?	
  




     User	
                            User	
  
      1	
                               2	
                                User	
  
                                                                            3	
                            User	
  
                                                                                                            4	
  




                    Item	
  1	
           Item	
  2	
      Item	
  3	
            Item	
  4	
  



                                                                           Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  12	
  	
  
Item	
  Similarity	
  


                                    What	
  should	
  we	
  recommend	
  for	
  User	
  1?	
  




     User	
                            User	
  
      1	
                               2	
                                User	
  
                                                                            3	
                            User	
  
                                                                                                            4	
  




                    Item	
  1	
           Item	
  2	
      Item	
  3	
            Item	
  4	
  



                                                                           Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  13	
  	
  
Slope	
  One	
  


                       User                                          Item 1                                     Item 2
                       A                                             3.5                                        2
                       B                                             ?                                          3


                                     User	
  A:	
  3.5	
  –	
  2	
  =	
  1.5	
  

                                     Item	
  1	
  (User	
  B)	
  =	
  3	
  +	
  1.5	
  =	
  4.5	
  	
  

l    Intui@on:	
  There	
  is	
  a	
  linear	
  rela@onship	
  between	
  rated	
  items	
  
       –  Y	
  =	
  mX	
  +	
  b	
  	
  where	
  m	
  =	
  1	
  
l    Solve	
  for	
  b	
  upfront	
  based	
  on	
  exis@ng	
  ra@ngs:	
  	
  b	
  =	
  (Y-­‐X)	
  
       –  Find	
  the	
  average	
  difference	
  in	
  preference	
  value	
  for	
  every	
  pair	
  of	
  items	
  

l    Online	
  can	
  be	
  very	
  fast,	
  but	
  requires	
  up	
  front	
  computa@on	
  and	
  memory	
  


                                                                                                          Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  14	
  	
  
Online	
  and	
  Offline	
  Recommenda;ons	
  



l    Online	
  
       –  Predates	
  Hadoop	
  
       –  Designed	
  to	
  run	
  on	
  a	
  single	
  node	
  
              •  Matrix	
  size	
  of	
  ~	
  100M	
  interac@ons	
  
       –  API	
  for	
  integra@ng	
  with	
  your	
  applica@on	
  

l    Offline	
  
       –  Hadoop	
  based	
  
       –  Designed	
  to	
  run	
  on	
  large	
  cluster	
  
       –  Several	
  approaches:	
  
              •  RecommenderJob,	
  ItemSimilarityJob,	
  ParallelALSFactoriza@onJob	
  




                                                                        Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  15	
  	
  
RecommenderJob	
  


l    Essen@ally	
  does	
  matrix	
  mul@plica@on	
  using	
  distributed	
  techniques	
  
l    $MAHOUT_HOME/bin/examples/asf-­‐email-­‐examples.sh	
  

        101     102      103     104      105                         User A                          Recs
                                                                      3.0                             30
101 7           2        0       1        3
                                                                      0                               37
102 2           8        3       5        2
                                                               X	
   4.0                    =	
  
103 0           3        3       6        4                                                           38

104 1           5        6       4        7                           3.0                             53

105 3           2        4       7        9                           2.0                             64



                                                               Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  16	
  	
  
Thinking	
  Lucene	
  	
  	
  	
  	
  	
  	
  Think	
  Lucid	
  




Discovery	
  with	
  Solr	
  




                                                                                           CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  17	
  	
  
Discovery	
  with	
  Solr	
  

l    Goals:	
  
       –  Guide	
  users	
  to	
  results	
  without	
  having	
  to	
  guess	
  at	
  keywords	
  
       –  Encourage	
  serendipity	
  
       –  Never	
  show	
  empty	
  results	
  


l    Out	
  of	
  the	
  Box:	
  
       –  Face@ng	
  
       –  Spell	
  Checking	
  
       –  More	
  Like	
  This	
  
       –  Clustering	
  (Carrot2)	
  
l    Extend	
  
       –  Clustering	
  (with	
  Mahout)	
  
       –  Frequent	
  Item	
  Mining	
  (with	
  Mahout)	
  


                                                                             Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  18	
  	
  
Clustering	
  



l    Automa@cally	
  group	
  similar	
  content	
  together	
  to	
  aid	
  users	
  in	
  discovering	
  
      related	
  items	
  and/or	
  avoiding	
  repe@@ve	
  content	
  


l    Solr	
  has	
  search	
  result	
  clustering	
  
       –  Pluggable	
  
       –  Default	
  implementa@on	
  uses	
  Carrot2	
  



l    Mahout	
  has	
  Hadoop	
  based	
  large	
  scale	
  clustering	
  
       –  K-­‐Means,	
  Minhash,	
  Dirichlet,	
  Canopy,	
  Spectral,	
  etc.	
  




                                                                        Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  19	
  	
  
Discovery	
  In	
  Ac;on	
  

  l    Pre-­‐reqs:	
  
               –  Apache	
  Ant	
  1.7.x,	
  Subversion	
  (SVN)	
  
  l    Command	
  Line	
  1:	
  
               –  svn	
  co	
  hOps://svn.apache.org/repos/asf/lucene/dev/trunk	
  solr-­‐trunk	
  
               –  cd	
  solr-­‐trunk/solr/	
  
               –  ant	
  example	
  
               –  cd	
  example	
  
               –  java	
  –Dsolr.clustering.enabled=true	
  –jar	
  start.jar	
  
  l    Command	
  Line	
  2	
  
               –  cd	
  exampledocs;	
  java	
  –jar	
  post.jar	
  *.xml	
  

  l    hOp://localhost:8983/solr/browse?
        q=&debugQuery=true&annotateBrowse=true	
  
        	
  


                                                                                Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  20	
  	
  
Thinking	
  Lucene	
  	
  	
  	
  	
  	
  	
  Think	
  Lucid	
  




Solr	
  +	
  Mahout	
  




                                                                                             CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  21	
  	
  
Basics	
  



l    Most	
  Mahout	
  tasks	
  are	
  offline	
  
l    Solr	
  provides	
  many	
  touch	
  points	
  for	
  integra@on:	
  
       –  ClusteringEngine	
  
             •  Clustering	
  results	
  
       –  SearchComponent	
  
             •  Sugges@ons	
  –	
  Related	
  searches,	
  clusters,	
  MLT,	
  spellchecking	
  
       –  UpdateProcessor	
  
             •  Classifica@on	
  of	
  documents	
  
       –  Func@onQuery	
  




                                                                             Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  22	
  	
  
Example:	
  Frequent	
  Itemset	
  Mining	
  



l    Discover	
  frequently	
  co-­‐occurring	
  items	
  


l    Use	
  Case:	
  Related	
  Searches	
  from	
  Solr	
  Logs	
  


l    Hadoop	
  and	
  sequen@al	
  versions	
  
       –  Parallel	
  FP	
  Growth	
  	
  


l    Input:	
  
       –  <op@onal	
  document	
  id>TAB<TOKEN1>SPACE<TOKEN2>SPACE	
  
       –  Comma,	
  pipe	
  also	
  allowed	
  as	
  delimiters	
  



                                                                        Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  23	
  	
  
FIM	
  on	
  Solr	
  Query	
  Logs	
  



l    Goal:	
  	
  
        –  Extract	
  user	
  queries	
  from	
  Solr	
  logs	
  
        –  Feed	
  into	
  FIM	
  to	
  generate	
  Related	
  Keyword	
  Searches	
  


l    Context:	
  
        –  Solr	
  Query	
  logs	
  
        –  bin/mahout	
  regexconverter	
  –input	
  $PATH_TO_LOGS	
  -­‐-­‐output	
  /tmp/solr/output	
  
           -­‐-­‐regex	
  "(?<=(?|&)q=).*?(?=&|$)"	
  -­‐-­‐overwrite	
  -­‐-­‐transformerClass	
  url	
  -­‐-­‐
           formaOerClass	
  fpg	
  
        –  bin/mahout	
  fpg	
  -­‐-­‐input	
  /tmp/solr/output/	
  -­‐o	
  /tmp/solr/fim/output	
  -­‐k	
  25	
  -­‐s	
  2	
  -­‐-­‐
           method	
  mapreduce	
  
        –  bin/mahout	
  seqdumper	
  -­‐-­‐seqFile	
  /tmp/solr2/results/frequentpaOerns/part-­‐
           r-­‐00000	
  

                                                                                 Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  24	
  	
  
Output	
  



l    Key:	
  Chris:	
  Value:	
  ([Chris,	
  HosteOer],870),	
  ([Chris],870),	
  ([Search,	
  Faceted,	
  
      Chris,	
  HosteOer,	
  Webcast,	
  Power,	
  Mastering],18),	
  ([Search,	
  Faceted,	
  Chris,	
  
      HosteOer,	
  Webcast,	
  Power],18),	
  ([Search,	
  Faceted,	
  Chris,	
  HosteOer],18),	
  
      ([Solr,	
  new,	
  Chris,	
  HosteOer,	
  webcast,	
  along,	
  sponsors,	
  DZone,	
  QA,	
  Refcard],
      12),	
  ([Solr,	
  new,	
  Chris,	
  HosteOer,	
  webcast,	
  along,	
  sponsors,	
  DZone],12),	
  
      ([Solr,	
  new,	
  Chris,	
  HosteOer,	
  webcast,	
  along,	
  sponsors],12),	
  ([Solr,	
  new,	
  
      Chris,	
  HosteOer,	
  webcast,	
  along],12),	
  ([Solr,	
  new,	
  Chris,	
  HosteOer,	
  webcast],
      12),	
  ([Solr,	
  new,	
  Chris,	
  HosteOer],12)	
  




                                                                  Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  25	
  	
  
Resources	
  



l    hOp://lucene.apache.org	
  
l    hOp://mahout.apache.org	
  
l    hOp://manning.com/owen	
  
l    hOp://manning.com/ingersoll	
  


l    hOp://www.lucidimagina@on.com	
  
l    grant@lucidimagina@on.com	
  
l    @gsingers	
  




                                          Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  26	
  	
  
Thinking	
  Lucene	
  	
  	
  	
  	
  	
  	
  Think	
  Lucid	
  




Appendix	
  




                                                                                  CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  27	
  	
  
Mahout	
  Overview	
  



                               Applications



                                                                Examples


                   Freq.
    Genetic        Pattern   Classification   Clustering            Recommenders
                   Mining

                             Math
     Utilities/Integration                          Collections                  Apache
                             Vectors/Matrices/
     Lucene/Vectorizer                              (primitives)                 Hadoop
                             SVD

    See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms

                                                 Copyright	
  Lucid	
  Imagina@on	
     CONFIDENTIAL	
  	
  	
  	
  	
  	
  |	
     	
  28	
  	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Lucidworks
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User IntentTrey Grainger
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Lucidworks
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document ClassificationAlessandro Benedetti
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 

Was ist angesagt? (20)

Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 

Ähnlich wie Enhance discovery Solr and Mahout

Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Enterprise 2.0 Conference
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and MahoutGrant Ingersoll
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryPaul Walk
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkJisc
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill lucenerevolution
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill lucenerevolution
 
Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Muthusamy Chelliah
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 topgeek
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Social Recommendation
Social RecommendationSocial Recommendation
Social Recommendationgu wendong
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrDataWorks Summit
 
Recommendations and User Understanding at StumbleUpon
Recommendations and User Understandingat StumbleUponRecommendations and User Understandingat StumbleUpon
Recommendations and User Understanding at StumbleUponDebora Donato
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebWeb Information Systems, TU Delft
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
Privacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPrivacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPavan Kapanipathi
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perdutaEdoardo Schepis
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perdutaBetter Software
 
Managing Online Business Communities
Managing Online Business CommunitiesManaging Online Business Communities
Managing Online Business CommunitiesSteffen Staab
 

Ähnlich wie Enhance discovery Solr and Mahout (20)

Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?Is Enterprise Search Ripe for Open Source Disruption?
Is Enterprise Search Ripe for Open Source Disruption?
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul Walk
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
The Seven Deadly Sins of Solr
The Seven Deadly Sins of SolrThe Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
 
Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Social Recommendation
Social RecommendationSocial Recommendation
Social Recommendation
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
 
Recommendations and User Understanding at StumbleUpon
Recommendations and User Understandingat StumbleUponRecommendations and User Understandingat StumbleUpon
Recommendations and User Understanding at StumbleUpon
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic Web
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Privacy Aware Semantic Dissemination
Privacy Aware Semantic DisseminationPrivacy Aware Semantic Dissemination
Privacy Aware Semantic Dissemination
 
Alla ricerca della User Story perduta
Alla ricerca della User Story perdutaAlla ricerca della User Story perduta
Alla ricerca della User Story perduta
 
Alla ricerca della user story perduta
Alla ricerca della user story perdutaAlla ricerca della user story perduta
Alla ricerca della user story perduta
 
Managing Online Business Communities
Managing Online Business CommunitiesManaging Online Business Communities
Managing Online Business Communities
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Enhance discovery Solr and Mahout

  • 1. Thinking  Lucene              Think  Lucid   Enhancing  Discovery  with  Solr  and   Mahout   Grant  Ingersoll   Chief  Scien@st   Lucid  Imagina@on   CONFIDENTIAL            |    1    
  • 2. Evolution Documents • Models • Feature Selection User Interaction Content • Clicks Relationships • Ratings/ • Page Rank, etc. Reviews • Organization • Learning to Rank • Social Graph Queries • Phrases • NLP Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    2    
  • 3. Minding the Intersection Search Analytics Discovery Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    3    
  • 4. Topics   l  Background   –  Apache  Mahout   –  Apache  Solr  and  Lucene   l  Recommenda@ons  with  Mahout   –  Collabora@ve  Filtering   l  Discovery  with  Solr  and  Mahout   l  Discussion   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    4    
  • 5. Apache  Lucene  in  a  Nutshell   l  hOp://lucene.apache.org/java   l  Java  based  Applica@on  Programming  Interface  (API)  for  adding  search  and   indexing  func@onality  to  applica@ons   l  Fast  and  efficient  scoring  and  indexing  algorithms   l  Lots  of  contribu@ons  to  make  common  tasks  easier:   –  Highligh@ng,  spa@al,  Query  Parsers,  Benchmarking  tools,  etc.   l  Most  widely  deployed  search  library  on  the  planet     Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    5    
  • 6. Apache  Solr  in  a  Nutshell   l  hOp://lucene.apache.org/solr   l  Lucene-­‐based  Search  Server  +  other  features  and  func@onality   l  Access  Lucene  over  HTTP:   –  Java,  XML,  Ruby,  Python,  .NET,  JSON,  PHP,  etc.   l  Most  programming  tasks  in  Lucene  are  taken  care  of  in  Solr   l  Face@ng  (guided  naviga@on,  filters,  etc.)   l  Replica@on  and  distributed  search  support   l  Lucene  Best  Prac@ces   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    6    
  • 7. Apache  Mahout  in  a  Nutshell   http://dictionary.reference.com/browse/mahout l  An  Apache  Socware  Founda@on  project  to  create   scalable  machine  learning  libraries  under  the  Apache   Socware  License   –  hOp://mahout.apache.org   l  The  Three  C’s:   –  Collabora@ve  Filtering  (recommenders)   –  Clustering   –  Classifica@on   l  Others:   –  Frequent  Item  Mining   –  Primi@ve  collec@ons   –  Math  stuff   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    7    
  • 8. Thinking  Lucene              Think  Lucid   Recommenda@ons  with  Mahout   CONFIDENTIAL            |    8    
  • 9. Recommenders   l  Collabora@ve  Filtering  (CF)   –  Provide  recommenda@ons  solely  based  on  preferences  expressed  between   users  and  items   –  “People  who  watched  this  also  watched  that”   l  Content-­‐based  Recommenda@ons  (CBR)   –  Provide  recommenda@ons  based  on  the  aOributes  of  the  items  and  user  profile   –  ‘Modern  Family’  is  a  sitcom,  Bob  likes  sitcoms     •  =>  Suggest  Modern  Family  to  Bob   l  Mahout  geared  towards  CF,  can  be  extended  to  do  CBR   –  Classifica@on  can  also  be  used  for  CBR   l  Aside:  search  engines  can  also  solve  these  problems   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    9    
  • 10. To  Rate  or  Not?   l  In  many  instances,  user’s  don’t  provide  actual  ra@ngs   –  Clicks,  views,  etc.   l  Non-­‐Boolean  ra@ngs  can  also  ocen  introduce  unnecessary  noise   –  Even  a  low  ra@ng  ocen  has  a  posi@ve  correla@on  with  highly  rated  items  in  the   real  world   l  Example:    Should  we  recommend  Frankenstein  to  Bob?   Dracula Dracula Jane Frankenstein Jane Eyre Java Programming Frankenstein Eyre Bob 1 4 ??? Bob 1 4 ??? - Mary 5 1 4 Mary 5 1 4 - Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    10    
  • 11. Collabora;ve  Filtering  with  Mahout   Item Item … Item m l  Extensive  framework  for  collabora@ve   1 2 filtering   User 1 - 0.5 0.9 l  Recommenders   –  User  based   User 2 0.1 0.3 - –  Item  based   … –  Slope  One   User n 0.8 0.7 0.1 l  Online  and  Offline  support   –  Offline  can  u@lize  Hadoop   Recommendations for User X Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    11    
  • 12. User  Similarity   What  should  we  recommend  for  User  1?   User   User   1   2   User   3   User   4   Item  1   Item  2   Item  3   Item  4   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    12    
  • 13. Item  Similarity   What  should  we  recommend  for  User  1?   User   User   1   2   User   3   User   4   Item  1   Item  2   Item  3   Item  4   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    13    
  • 14. Slope  One   User Item 1 Item 2 A 3.5 2 B ? 3 User  A:  3.5  –  2  =  1.5   Item  1  (User  B)  =  3  +  1.5  =  4.5     l  Intui@on:  There  is  a  linear  rela@onship  between  rated  items   –  Y  =  mX  +  b    where  m  =  1   l  Solve  for  b  upfront  based  on  exis@ng  ra@ngs:    b  =  (Y-­‐X)   –  Find  the  average  difference  in  preference  value  for  every  pair  of  items   l  Online  can  be  very  fast,  but  requires  up  front  computa@on  and  memory   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    14    
  • 15. Online  and  Offline  Recommenda;ons   l  Online   –  Predates  Hadoop   –  Designed  to  run  on  a  single  node   •  Matrix  size  of  ~  100M  interac@ons   –  API  for  integra@ng  with  your  applica@on   l  Offline   –  Hadoop  based   –  Designed  to  run  on  large  cluster   –  Several  approaches:   •  RecommenderJob,  ItemSimilarityJob,  ParallelALSFactoriza@onJob   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    15    
  • 16. RecommenderJob   l  Essen@ally  does  matrix  mul@plica@on  using  distributed  techniques   l  $MAHOUT_HOME/bin/examples/asf-­‐email-­‐examples.sh   101 102 103 104 105 User A Recs 3.0 30 101 7 2 0 1 3 0 37 102 2 8 3 5 2 X   4.0 =   103 0 3 3 6 4 38 104 1 5 6 4 7 3.0 53 105 3 2 4 7 9 2.0 64 Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    16    
  • 17. Thinking  Lucene              Think  Lucid   Discovery  with  Solr   CONFIDENTIAL            |    17    
  • 18. Discovery  with  Solr   l  Goals:   –  Guide  users  to  results  without  having  to  guess  at  keywords   –  Encourage  serendipity   –  Never  show  empty  results   l  Out  of  the  Box:   –  Face@ng   –  Spell  Checking   –  More  Like  This   –  Clustering  (Carrot2)   l  Extend   –  Clustering  (with  Mahout)   –  Frequent  Item  Mining  (with  Mahout)   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    18    
  • 19. Clustering   l  Automa@cally  group  similar  content  together  to  aid  users  in  discovering   related  items  and/or  avoiding  repe@@ve  content   l  Solr  has  search  result  clustering   –  Pluggable   –  Default  implementa@on  uses  Carrot2   l  Mahout  has  Hadoop  based  large  scale  clustering   –  K-­‐Means,  Minhash,  Dirichlet,  Canopy,  Spectral,  etc.   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    19    
  • 20. Discovery  In  Ac;on   l  Pre-­‐reqs:   –  Apache  Ant  1.7.x,  Subversion  (SVN)   l  Command  Line  1:   –  svn  co  hOps://svn.apache.org/repos/asf/lucene/dev/trunk  solr-­‐trunk   –  cd  solr-­‐trunk/solr/   –  ant  example   –  cd  example   –  java  –Dsolr.clustering.enabled=true  –jar  start.jar   l  Command  Line  2   –  cd  exampledocs;  java  –jar  post.jar  *.xml   l  hOp://localhost:8983/solr/browse? q=&debugQuery=true&annotateBrowse=true     Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    20    
  • 21. Thinking  Lucene              Think  Lucid   Solr  +  Mahout   CONFIDENTIAL            |    21    
  • 22. Basics   l  Most  Mahout  tasks  are  offline   l  Solr  provides  many  touch  points  for  integra@on:   –  ClusteringEngine   •  Clustering  results   –  SearchComponent   •  Sugges@ons  –  Related  searches,  clusters,  MLT,  spellchecking   –  UpdateProcessor   •  Classifica@on  of  documents   –  Func@onQuery   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    22    
  • 23. Example:  Frequent  Itemset  Mining   l  Discover  frequently  co-­‐occurring  items   l  Use  Case:  Related  Searches  from  Solr  Logs   l  Hadoop  and  sequen@al  versions   –  Parallel  FP  Growth     l  Input:   –  <op@onal  document  id>TAB<TOKEN1>SPACE<TOKEN2>SPACE   –  Comma,  pipe  also  allowed  as  delimiters   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    23    
  • 24. FIM  on  Solr  Query  Logs   l  Goal:     –  Extract  user  queries  from  Solr  logs   –  Feed  into  FIM  to  generate  Related  Keyword  Searches   l  Context:   –  Solr  Query  logs   –  bin/mahout  regexconverter  –input  $PATH_TO_LOGS  -­‐-­‐output  /tmp/solr/output   -­‐-­‐regex  "(?<=(?|&)q=).*?(?=&|$)"  -­‐-­‐overwrite  -­‐-­‐transformerClass  url  -­‐-­‐ formaOerClass  fpg   –  bin/mahout  fpg  -­‐-­‐input  /tmp/solr/output/  -­‐o  /tmp/solr/fim/output  -­‐k  25  -­‐s  2  -­‐-­‐ method  mapreduce   –  bin/mahout  seqdumper  -­‐-­‐seqFile  /tmp/solr2/results/frequentpaOerns/part-­‐ r-­‐00000   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    24    
  • 25. Output   l  Key:  Chris:  Value:  ([Chris,  HosteOer],870),  ([Chris],870),  ([Search,  Faceted,   Chris,  HosteOer,  Webcast,  Power,  Mastering],18),  ([Search,  Faceted,  Chris,   HosteOer,  Webcast,  Power],18),  ([Search,  Faceted,  Chris,  HosteOer],18),   ([Solr,  new,  Chris,  HosteOer,  webcast,  along,  sponsors,  DZone,  QA,  Refcard], 12),  ([Solr,  new,  Chris,  HosteOer,  webcast,  along,  sponsors,  DZone],12),   ([Solr,  new,  Chris,  HosteOer,  webcast,  along,  sponsors],12),  ([Solr,  new,   Chris,  HosteOer,  webcast,  along],12),  ([Solr,  new,  Chris,  HosteOer,  webcast], 12),  ([Solr,  new,  Chris,  HosteOer],12)   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    25    
  • 26. Resources   l  hOp://lucene.apache.org   l  hOp://mahout.apache.org   l  hOp://manning.com/owen   l  hOp://manning.com/ingersoll   l  hOp://www.lucidimagina@on.com   l  grant@lucidimagina@on.com   l  @gsingers   Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    26    
  • 27. Thinking  Lucene              Think  Lucid   Appendix   CONFIDENTIAL            |    27    
  • 28. Mahout  Overview   Applications Examples Freq. Genetic Pattern Classification Clustering Recommenders Mining Math Utilities/Integration Collections Apache Vectors/Matrices/ Lucene/Vectorizer (primitives) Hadoop SVD See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms Copyright  Lucid  Imagina@on   CONFIDENTIAL            |    28