Presented by Peter Wolanin | Acquia, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
If you have a new web project or and existing Drupal site, the combination of Drupal and Apache Solr is both powerful and easy to set up thanks to the existing integration code. The module allows for substantial customization with the administrative UI. Drupal facilitates further customizations of the UI, indexing, and bosting because of the open architecture that provides multiple opportunities for custom code to alter the behavior. A couple code snippets will be followed by a review of other contributed Drupal modules that further enhance the search capability.
Finally, this session will showcase some example of Drupal sites using Solr including Acquia's own sites and Drupal sites including many well-known Enterprise and government sites.
Things Made Easy: One Click CMS Integration with Solr & Drupal
1. May 10, 2012
Things Made Easy: One Click
CMS Integration with Solr &
Drupal
Peter M. Wolanin, Ph.D.
Momentum Specialist (principal engineer), Acquia, Inc.
Drupal contributor drupal.org/user/49851
co-maintainer of the Drupal Apache Solr Search Integration module
2. Key Questions to Be Answered
• What is Drupal?
• What Apache Solr features are integrated with
Drupal?
• Why is Drupal plus Apache Solr is better than
starting from scratch?
• What elements of the search can you
configure in the UI without code?
3. Why Are You Here?
• You are starting a new website project?
• You are wondering how hard it is to actually
integrate Apache Solr with a website?
• You already use Drupal but not with Apache
Solr?
• You like things that are easy yet powerful?
4. Drupal: Web Application Framework +
CMS == Social Publishing Platform
Drupal “… is as much a Social Software platform
as it is a web content management system.”
content users
CMS Watch, The Web CMS Report 2009
blogs /
workflow wikis
forums /
taxonomy comments
Content Social
Mgmt Software
Systems Tools social
semantic web ranking
RSS social tagging
social
analytics networks
5. Drupal + Solr Provides Immediate
Access to Rich Search Features
Dynamic content requires dynamic navigation -
which is provided by an effective search
Search facets mean no dead ends
Solr provides better keyword relevancy in results
Much faster searches for sites with lots of content
By avoiding database queries, Drupal with Solr
scales better
6. DEMO:
A Drupal 7 partial copy of the conference
site with Apache Solr integration
http://youtu.be/yY6kma_ViWc
7.
8. Drupal Has User Accounts, Roles
& Permissions
Define custom roles
Set granular access
controls by role
Configure user
behavior:
– Registration
– Email
– Profiles
– Pictures
9. Drupal Modules Add
Functionality
“There’s a module for that”
More than 4100 Drupal 7
community modules
Often controlled by role-
based permissions
Drupal core and modules
are GPL v2+, and have a
huge, active community
10. Drupal is Written in PHP, Which
Makes for Easy Customization
The Drupal architecture encourages and provides
many avenues for customization by writing
modules but not patching Drupal core
Drupal has a huge community of users.
Approximately 10,000 sites report to Drupal.org
that they use the Apache Solr Search Integration
module.
12. Drupal Entities are Content + Data
Nodes are the basic entity
used for text content
Node 1 Node 2 Node 3
The entity system is
extensible - can represent
Node 4 Node 5 Node 6
any data
Examples of data stored
within Drupal entities Node 7 Node 8 Node 9
– Text
– geographic location
– Node reference
13. Entity Types are Enriched With
User-configurable Data Fields
Define new data fields on
a node using the Field API
module.
– Text, images, integers, date,
reference, etc
Flexible and configurable
in the UI
No programming required
(many existing modules)
14.
15. A Strong Framework for
Content Classification
Core taxonomy system
Modules provide
taxonomy-based
appearance, access
control
Standard input options
include free tagging,
flat-controlled, and
hierarchical-controlled
16. Drupal + Solr Search for Business,
Government and NGOs
http://www.mattel.com/search/
apachesolr_search/
https://www.eff.org/search/site/
http://www.poly.edu/search/apachesolr_search/
http://www.whitehouse.gov/search/site/
http://opensource.com/search/apachesolr_search/
https://www.ethicshare.org/publications/ http://www.nypl.org/search/apachesolr_search/
http://www.mylifetime.com/community/search/apachesolr_search/
http://www.emporia.edu/search/site/
http://www.restorethegulf.gov/search/apachesolr_search/
http://www.hrw.org/en/search/apachesolr_search/
17. Drupal Has Already Solved Many
Solr Integration Challenges
The most important - content indexing.
Facets, sorting, and highlighting of results.
Immediate integration with the More Like This
and spell-check handlers.
Included sub-module integrates content access
permissions by indexing to and filtering Solr
results based on the current user.
19. The Module Has a Pipeline for
Indexing Drupal Content to Solr
Drupal entities are processed into one (or more)
document objects. Each document object is
converted to XML and sent to Solr.
Node object Document object XML string
entity_type <doc>
title label <field
<field
name="entity_type">node</field>
name="label">Hello Drupal</field>
<field name="entity_id">101</field>
nid entity_id <field
</doc>
name="bundle">session</field>
type Drupal bundle
functions
20. Entity Meta-data Gives
Automatic Facets!
Content types
Taxonomy terms per
vocabulary
Content authors
Posted and modified dates
Text and numbers selected
via select list/radios/check
boxes
21. Drupal Modules Implement hooks
to Control Indexing and Display
HOOK_apachesolr_index_document_build($document,
$entity, $entity_type, $env_id)
By creating a Drupal module (in PHP), you can
implement module and theme “hooks” to extend or
alter Drupal behavior.
Change or replace the data normally indexed.
Modify the search results and their appearance.
22. Updates to an Entity or Related
Meta-data Cause Reindexing
Drupal entities are indexed during Drupal cron
(typically invoked via *nix cron).
By using a specialized tracking table, content
can automatically be queued for reindex when
changed, and subsets of content can potentially
be sent to different Solr indexes.
Entities include many ID-based reference fields
(e.g. the User ID of the author). Changes to the
referenced data is also watched.
23. Indexing Tracking Tables Maintain
Order
+-------------+-----------+-------------+--------+------------+
| entity_type | entity_id | bundle | status | changed |
+-------------+-----------+-------------+--------+------------+
| node | 36 | session | 1 | 1336520756 |
| node | 37 | session | 1 | 1336510489 |
| node | 38 | session | 1 | 1336510456 |
| node | 39 | session | 1 | 1336510456 |
| node | 40 | speaker_bio | 1 | 1336510456 |
+-------------+-----------+-------------+--------+------------+
When a node is updated, the “changed” timestamp
is updated.
The indexing pipeline tracks the largest timestamp
and entity_id which has been indexed.
24. Example: Taxonomy Term
Classifying a Node is Changed
Grapefruit Citrus fruit
function apachesolr_taxonomy_term_update($term)
All nodes classified with this terms are queued
to be re-indexed by setting the “changed”
column to the current time.
Thus you will correctly match ‘Citrus’ instead of
‘Grapefruit’ for those documents.
25. When Unpublished, Content is
Purged
Drupal core includes a simple editorial workflow
where content may be toggled between
published (visible) and unpublished
(incomplete, removed, spam, etc).
The module immediately removes content from
the index when unpublished, and also tracks it
for future removal in case the Solr server is
unavailable.
26. Search Using Dismax Query
Parsing & Boosting Features
Dynamic fields in schema.xml used to index
standard and custom entity data fields
Dismax (or EDismax) handler used for keyword
searching across multiple fields and per-field boosts
Query-time boosting options available in the UI
27. A Query Object Is Used to
Prepare and Run Searches
HOOK_apachesolr_query_prepare($query)
$query->setParam('hl.fl', $field);
$keys = $query->getParam('q');
$response = $query->search();
28. More Modules Available to
Add More Features
A few examples:
ApacheSolr Attachments
Apache Solr Multisite Search
Apache Solr Organic Groups Integration
Apachesolr User indexing
Apachesolr Commerce
29. To Wrap Up !
Drupal has extensive Apache Solr integration
already, and is highly customizable.
The Drupal platform is widely adopted, and the
Drupal community drives rapid innovation.
Acquia provides Enterprise Drupal support and a
network of partners.
Acquia includes a secure, hosted Solr index with
every support subscription.
30. Did I Answer These?
• What is Drupal?
• What Apache Solr features are integrated with
Drupal?
• Why is Drupal plus Apache Solr is better than
starting from scratch?
• What elements of the search can you
configure in the UI without code?
31. Other PHP Integration Tools
• http://www.solarium-project.org/
• http://php.net/solr
http://pecl.php.net/package/solr
• http://code.google.com/p/solr-php-client/
Caveat: don’t use serialized PHP response format in
a custom integration - use JSON writer.
32. Acquia is Hiring!
• Do you love Drupal, Solr, the LAMP stack,
DevOps or anything related, and working at a
fast-growing and successful startup?
• Boston and Portland area U.S. offices.
• Some remote opportunities as well.
• Come talk to me!
peter.wolanin@acquia.com
pwolanin in IRC #drupal or #solr