SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Mastering solr
   Jur de Vries
Who am I?

Developer/architect at Triquanta
Trainer at Wizzlern
Use case

Market place
Advertisements
Adjust relevancy
Paid boosting of add's
Of course we use Drupal and Apache Solr
Running Solr Locally

Download latest version (3.6)
Be sure to download distribution (not src)
Unpack solr
Go to example directory
Run
 java -jar start.jar
Drupal: which contrib?

2 Possibilities
  Apachesolr search
  Search api with solr backend
Apache solr search

Streghts:
  Supported by Acquia
  Easy to set up
  Mature
Weaknesses
  Integration with views (still in dev)
Search Api

Strengths
  Flexible
  Indexes all entities
  Excellent views integration
  Related fields are easy to add to index
Weaknesses
  Not supported (yet) by Acquia
  Solr backend has some issues
Drupal: which contrib?

Apachesolr search integration
  Quick setup
  Acquia
Search API
  Exportable configuaration
  Views integration
  Index all entities
Depends on your needs
Basic use of search api

Create server
Create index
  Select fields to index
  Define data alterations
  Define processors
Start indexing
Field types

Integer, date, boolean
String or fulltext?
  Fulltext will get processed!
      Tokenize
      Stopwords
      Ignore case
  String is as is
Demo

Run solr
Copy schema.xml and solrconfig.xml (!)
Create server
Create index
Create view
  ads
  Ad filter exposed: search
Advanced use of Search api

This talk is about Solr, not about search API
Understand Solr first!
Many resources on the web
Watch screencasts etc
Mastering Solr

Mastering solr is understanding solr
What happens after a Drupal module?
Let's have a look at the request
Solr request

Look at solr log
Parameters:
  start
  rows
  q (query)
  qf (query fields)
  fl (fields)
  fq (filter query)
Field names

item_id, id
t_.., ss_.., → why?
Solr has to know how to handle fields
Field api: field names differ
Dynamic field names: tell solr field type!
Schema.xml

Defines field types and fields
The real tweaking starts here!
Let's have a look!
  dynamicField
  field type
  analyzers
Copyfield
What can you do in schema.xml?

Synonyms (is disabled by default)
Stopwords (and, or, etc)
Stemming
Proper multilingual handling
Browse the schema

Solr offers schema browsing
Go to: http://localhost:8983/solr/admin
Search relevancy

Types of boosting:
  Field level boost
  Boost function
  Boost query
  (QueryElevation)
Boost parameters

Field level boosting: qf
   qf:t_body^20
   score in field is multiplied by 20
Boost function: bf
   bf:product(fieldname, 2)
   result of function is added to score
Boost query: bq
boost (only for edismax) like bf but multiplication
Let's boost title

Field level boost is incorporated in Search API...
But, where are the numbers in the request???
Search api solr forgot to add them!
There is a patch :-)
But lets do it another way...
Debugging Solr

Lets add &echoParams=all to the request...
Where do all these parameters come from?
Solrconfig.xml!!!
Among other things: request handler
Let's look at the dismax request handler
Solrconfig.xml

(Default) Request handler:
  Default parameters
  Add Spellcheck
  Tweak all kinds of search behavior!
  Let's add default search fields with boost
Boost function

Mathematical functions on field values
Available functions:
  sum(x,y): x + y
  product(x,y): x * y
  scale(x, minTarget, maxTarget)
  recip(x, m, a, b): x / (m * a + b)
  ms(): time → ms(NOW/DAY, created)
  Many more!
Boost date

We need ms(): big values!
Linear? To much difference
Recip!
recip(x,1,1000,1000)
if x 1000: half
1 year: 3.1e10
recip(ms(NOW/YEAR?, created),1,3.1e10,3.1e10)
bf=recip(ms(NOW/YEAR?, created),1,3.1e10,3.1e10)^3
Use a graphing tool!
Boost queries

Do a query like fq:
Boost add's:
  content_type:add
  bq=content_type:add
  bq=(content_type)^20
Debugging relevancy

We know how to boost
How can finetuning be done?
solr has the solutions:
  add debugQuery=on
debugQuery=on


normal                   source
Relevancy

Choose your boosting methods
Try in your browser
Finetuning: debugQuery=on, source
Add parameters to solrconfig.xml
Or...
Add parameters in code

use
hook_search_api_solr_query_alter(array
  &$call_args, SearchApiQueryInterface $query)
$call_args['params']['bq'] = '(t_title:foo)^20'
$call_args['params']['bf'][] = b_promote
Override solr service class

In Search API: define server class
extend solr service class
Only change key methods
It's all about passing parameters!
Conclusion

Tweak indexing in schema.xml
  Stopwords
  Multilingual
Tweak searching in solrconfig.xml
Tweak searching by passing variables
This is only an introduction!
Questions?
Feedback & follow-up:
http://drupalcampgent.be/feedback

Weitere ähnliche Inhalte

Was ist angesagt?

Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
Erik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 

Was ist angesagt? (20)

Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 

Ähnlich wie Mastering solr

Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pub
dstuartnz
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
lucenerevolution
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
Fabio Franzini
 
Programming With Amazon, Google, And E Bay
Programming With Amazon, Google, And E BayProgramming With Amazon, Google, And E Bay
Programming With Amazon, Google, And E Bay
Phi Jack
 

Ähnlich wie Mastering solr (20)

Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pub
 
Building strong foundations apex enterprise patterns
Building strong foundations apex enterprise patternsBuilding strong foundations apex enterprise patterns
Building strong foundations apex enterprise patterns
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Introduction to Force.com
Introduction to Force.comIntroduction to Force.com
Introduction to Force.com
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Simplify your professional web development with symfony
Simplify your professional web development with symfonySimplify your professional web development with symfony
Simplify your professional web development with symfony
 
New Features in JDK 8
New Features in JDK 8New Features in JDK 8
New Features in JDK 8
 
Introduction to coding using Python
Introduction to coding using PythonIntroduction to coding using Python
Introduction to coding using Python
 
Julio Capote, Twitter
Julio Capote, TwitterJulio Capote, Twitter
Julio Capote, Twitter
 
Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
 
Salesforce
SalesforceSalesforce
Salesforce
 
Flock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISLFlock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISL
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 Release
 
Programming With Amazon, Google, And E Bay
Programming With Amazon, Google, And E BayProgramming With Amazon, Google, And E Bay
Programming With Amazon, Google, And E Bay
 
Mike Taulty MIX10 Silverlight Frameworks and Patterns
Mike Taulty MIX10 Silverlight Frameworks and PatternsMike Taulty MIX10 Silverlight Frameworks and Patterns
Mike Taulty MIX10 Silverlight Frameworks and Patterns
 
Odoo from 7.0 to 8.0 API
Odoo from 7.0 to 8.0 APIOdoo from 7.0 to 8.0 API
Odoo from 7.0 to 8.0 API
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Mastering solr

  • 1. Mastering solr Jur de Vries
  • 2. Who am I? Developer/architect at Triquanta Trainer at Wizzlern
  • 3. Use case Market place Advertisements Adjust relevancy Paid boosting of add's Of course we use Drupal and Apache Solr
  • 4. Running Solr Locally Download latest version (3.6) Be sure to download distribution (not src) Unpack solr Go to example directory Run java -jar start.jar
  • 5. Drupal: which contrib? 2 Possibilities Apachesolr search Search api with solr backend
  • 6. Apache solr search Streghts: Supported by Acquia Easy to set up Mature Weaknesses Integration with views (still in dev)
  • 7. Search Api Strengths Flexible Indexes all entities Excellent views integration Related fields are easy to add to index Weaknesses Not supported (yet) by Acquia Solr backend has some issues
  • 8. Drupal: which contrib? Apachesolr search integration Quick setup Acquia Search API Exportable configuaration Views integration Index all entities Depends on your needs
  • 9. Basic use of search api Create server Create index Select fields to index Define data alterations Define processors Start indexing
  • 10. Field types Integer, date, boolean String or fulltext? Fulltext will get processed! Tokenize Stopwords Ignore case String is as is
  • 11. Demo Run solr Copy schema.xml and solrconfig.xml (!) Create server Create index Create view ads Ad filter exposed: search
  • 12. Advanced use of Search api This talk is about Solr, not about search API Understand Solr first! Many resources on the web Watch screencasts etc
  • 13. Mastering Solr Mastering solr is understanding solr What happens after a Drupal module? Let's have a look at the request
  • 14. Solr request Look at solr log Parameters: start rows q (query) qf (query fields) fl (fields) fq (filter query)
  • 15. Field names item_id, id t_.., ss_.., → why? Solr has to know how to handle fields Field api: field names differ Dynamic field names: tell solr field type!
  • 16. Schema.xml Defines field types and fields The real tweaking starts here! Let's have a look! dynamicField field type analyzers Copyfield
  • 17. What can you do in schema.xml? Synonyms (is disabled by default) Stopwords (and, or, etc) Stemming Proper multilingual handling
  • 18. Browse the schema Solr offers schema browsing Go to: http://localhost:8983/solr/admin
  • 19. Search relevancy Types of boosting: Field level boost Boost function Boost query (QueryElevation)
  • 20. Boost parameters Field level boosting: qf qf:t_body^20 score in field is multiplied by 20 Boost function: bf bf:product(fieldname, 2) result of function is added to score Boost query: bq boost (only for edismax) like bf but multiplication
  • 21. Let's boost title Field level boost is incorporated in Search API... But, where are the numbers in the request??? Search api solr forgot to add them! There is a patch :-) But lets do it another way...
  • 22. Debugging Solr Lets add &echoParams=all to the request... Where do all these parameters come from? Solrconfig.xml!!! Among other things: request handler Let's look at the dismax request handler
  • 23. Solrconfig.xml (Default) Request handler: Default parameters Add Spellcheck Tweak all kinds of search behavior! Let's add default search fields with boost
  • 24. Boost function Mathematical functions on field values Available functions: sum(x,y): x + y product(x,y): x * y scale(x, minTarget, maxTarget) recip(x, m, a, b): x / (m * a + b) ms(): time → ms(NOW/DAY, created) Many more!
  • 25. Boost date We need ms(): big values! Linear? To much difference Recip! recip(x,1,1000,1000) if x 1000: half 1 year: 3.1e10 recip(ms(NOW/YEAR?, created),1,3.1e10,3.1e10) bf=recip(ms(NOW/YEAR?, created),1,3.1e10,3.1e10)^3 Use a graphing tool!
  • 26. Boost queries Do a query like fq: Boost add's: content_type:add bq=content_type:add bq=(content_type)^20
  • 27. Debugging relevancy We know how to boost How can finetuning be done? solr has the solutions: add debugQuery=on
  • 29. Relevancy Choose your boosting methods Try in your browser Finetuning: debugQuery=on, source Add parameters to solrconfig.xml Or...
  • 30. Add parameters in code use hook_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) $call_args['params']['bq'] = '(t_title:foo)^20' $call_args['params']['bf'][] = b_promote
  • 31. Override solr service class In Search API: define server class extend solr service class Only change key methods It's all about passing parameters!
  • 32. Conclusion Tweak indexing in schema.xml Stopwords Multilingual Tweak searching in solrconfig.xml Tweak searching by passing variables This is only an introduction!