Third & GroveThird & Grove
Feb 5, 2016 - Willow Hunt

Customizing Results from Drupal’s Apachesolr Module

 

Recently we had a request from one of our clients to alter their existing site search behavior. They wanted to change their existing site search so that all relative links on the page were indexed separately. That is, a separate search result would be provided for each relative link on the page. Consider the following document:

Page Header
      Content
Subheader #1 (anchor link)
      Content
Subheader #2 (anchor link)
      Content

We’d like to provide search results for not only the entire page, but Subheader #1 and #2 as if they were separate pieces of content. Fortunately, the Apachesolr module provides a few convenient hooks to allow us to provide this behavior.

First, we need to parse through each document before it is submitted to SOLR for indexing. We can use the hook_apachesolr_index_document_build() hook to accomplish this. Here, we parse the document and iterate over all relative links in the document. For each relative link, we create a copy of the document, and set the content to a snippet from the original document.

 

/**
 * Implements hook_apachesolr_index_document_build().
 */
function custom_search_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
  if (!isset($entity->body[LANGUAGE_NONE][0])) {
    return;
  }
  $content = render(field_view_field('node', $entity, 'body'));
  if (empty($content)) {
    return;
  }
 
  $dom = new DOMDocument();
  $paragraph_document = clone $document;
 
  $paragraph_document->content = trim($content);
  $paragraph_document->teaser = trim($content);
  $documents[] = $paragraph_document;
 
  $dom->loadHTML($content);
 
  foreach($dom->getElementsByTagName('a') as $link) {
    if ($link->hasAttribute('id') && $link->parentNode->textContent) {
      $paragraph_document = clone $document;
 
      $paragraph_document->content = trim($link->parentNode->textContent);
      $paragraph_document->teaser = trim($link->parentNode->textContent);
      $paragraph_document->id = 'node-' . $entity->nid . '-' . $link->getAttribute('id');
      $paragraph_document->bundle = 'link';
      $paragraph_document->bundle_name = 'link';
 
      $documents[] = $paragraph_document;
    }
  }
 
  apachesolr_index_send_to_solr($env_id, $documents);
}

 

If we reindex solr now, we should be able to see that new documents have been created for each anchor link. However, these results will still be linked to the container document. Also, each anchor link will have the same title as the container document. In order to make each link lead to the anchor link that it’s associated with, we can use hook_apachesolr_process_results().

 

/**
 * Implements hook_apachesolr_process_results().
 */
function custom_search_apachesolr_process_results(array &$results, DrupalSolrQueryInterface $query) {
  foreach ($results as $key => $result) {
    if ($result['bundle'] == 'link') {
      $paragraph_id = str_replace('node-' . $results[$key]['fields']['entity_id'] . '-', '', $results[$key]['fields']['id']);
      $results[$key]['link'] = $results[$key]['link'] . '#' . $paragraph_id;
      $results[$key]['title'] = strip_tags($results[$key]['snippet']);
    }
  }
}

 

Here, we add a title for the anchor link and add the unique id we generated earlier to the link so that when we click on the result, we’ll be taken to the paragraph containing the anchor link instead of the top of the document.

There we have it! You can use this as a template for parsing through content and providing different behaviors on site-wide search!