Apache SOLR in AEM 6

Introduction to Apache SOLR and configuring Apache SOLR with AEM 6

  1. 1. Introduction to Apache SOLR in Adobe AEM 6 Dr.Yash Mody, PhD CTO | Tekno Point Consulting
  2. 2. About Me Adobe AEM,Apache Hadoop Instructor & Consultant Application Architecture and Design Consultant Need I say more? www.teknopoint.us  
  4. 4. Information Retrieval Document Term Inverted Index Term Frequency (tf) Skip Pointers Positional Index Collection Frequency Document Frequency (df) Inverse Frequency Idf = Log10(N/df) Term Frequency Inverse Document Frequency tf-idf = tf * Idf www.teknopoint.us  
  6. 6. Apache SOLR Fire Powered Lucene Distributed Replicated Remote And just for the record its… SEARCH On LUCENE w/REPLICATION (TBHPHB) www.teknopoint.us  
  7. 7. Installation Unpack SOLR distribution Add solr.war to webapps Add –Dsolr.solr.home = … OR http://bitnami.com/stack/solr www.teknopoint.us  
  8. 8. Getting solr ready Starting SOLR cd /usr/local/Cellar/solr/4.7.2/libexec/example/ - jetty java -jar start.jar http://localhost:8983/solr/#/ Adding content using www.teknopoint.us  
  9. 9. Index and search Indexing Data java -jar post.jar solr.xml Searching http://localhost:8983/solr/select?q=solr&wt=json www.teknopoint.us  
  10. 10. Configurations Configurations are done in 2 xml files schema.xml – SOLR index configurations solrconfig.xml – SOLR configurations www.teknopoint.us  
  11. 11. Indexing Indexing is using HTTP POST. So indexed can be posted to SOLR via a web request Data can be pulled using Data Import Handler (uses HTTP GET or DB) SOLR can index binary content (textual + metadata) from docs, video, mp3, images and other binary content www.teknopoint.us  
  12. 12. Search Search features: Paging, Filtering, Sorting, Faceting Results: xml (Default), json, php, ruby, python etc. Query Parser: used to interpret queries. 2 types of query parsers Lucene Query Syntax Parser DisMax Parser (Disjunction Max) www.teknopoint.us  
  13. 13. Solr integration approaches Crawl using an external crawler like Nutch or Heritrix CQ servlets to serialize content into a Solr (JSON/XML) JCR Observer for page modifications to trigger indexing to Solr. www.teknopoint.us  
  14. 14. AEM 6 2 Types In Built Remote (For distributed) Zookeeper (for setting up a cluster) Shard – horizontal Partition Replication – no of copies of the index files www.teknopoint.us  
  15. 15. SOLR things we didn’t see https://github.com/evolvingweb/ajax-solr http://wiki.apache.org/solr/SolrQuerySyntax www.teknopoint.us  
  16. 16. Thanks @yash_mody http://www.linkedin.com/in/modyyash