Custom Content Indexing In Habitat Based Solutions Using Computed Fields

Posted 12/20/2016 by Ed Kapuscinski

There are times in a Sitecore developer's life when you realize that the standard search behavior just isn't going to cut it. In my experience, this often happens when you "flatten" Sitecore content from numerous items into display on a single page. This is increasingly prevalent in solutions that make use of componentized architectures to support Sitecore's most powerful personalization and testing features. Luckily, Sitecore makes this relatively easy to do. The hardest part was figuring out the moving pieces you needed to make and touch. This can be especially confusing in a Habitat/Helix based foundation/feature/project architecture. 

There are two main things you need to do.

  1. Populate a field in the index.
  2. Tell the search to look at the field.

The example I'm using in this article is indexing the content of the tabs displayed on a page alongside the page itself, even though those tabs are child items of the page's item itself.

Here's how it looks on a page.
How the rendered tabs look on the page.

And here's how those tabs live in Sitecore.
The content tree that holds the page and its tabs.

Populating the index

The first step is populating the index. This is done by adding a "field" definition in the Sitecore configs at configuration/sitecore/contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fields. This field definition tells Sitecore what to index and how. This field can easily be patched in using the standard config patching method: create a config file and make sure it gets put in the app_config/include folder. In my solution, I created a Feature.TabSection.Indexing.config file that went in the project's app_config/include/feature folder with the following contents.

Note: This xml does not contain the full working version, look below for that.

   
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration
          type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <fields hint="raw:AddComputedIndexField">
            <field fieldName="tabsContent" type="Sitecore.Feature.TabSection.Indexing.TabSectionField, Sitecore.Feature.TabSection" storageType="yes"  indexType="tokenized" />
          </fields>
        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

This config tells Sitecore (and Lucene) to index a field called "tabsContent", and to use the "Sitecore.Feature.TabSection.Indexing.TabSectionField.ComputeFieldValue()" method contained in the "Sitecore.Feature.TabSection.dll" DLL to populate it.

Notice that the "ComputeFieldValue" method name isn't mentioned in the config. Sitecore just knows to do that (because it's magical). Likewise, the ".dll" part is left out of the assembly name. Again: magic.

I put the ComputeFieldValue() method in a class file that I created in my TabSection project's "Indexing" folder.

Where TabSectionComputedField code file lives in my VS solution.

This file contains the logic to extract the data from the appropriate child items and populate the field with a string containing their content.

using System.Collections.Generic;
using System.Linq;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.ComputedFields;
using Sitecore.Data.Fields;
using Sitecore.Data.Items;
using System.Text.RegularExpressions;
using Sitecore.Foundation.SitecoreExtensions.Extensions;

namespace Sitecore.Feature.TabSection.Indexing
{
    /// <summary>
    /// Computed field that contains the contents of a page's tabs.
    /// </summary>
    public class TabSectionField : IComputedIndexField
    {
        public string FieldName { get; set; }
        public string ReturnType { get; set; }

        const string HTML_TAG_PATTERN = "<.*?>";

        public object ComputeFieldValue(IIndexable indexable)
        {
            // Housekeeping
            var sitecoreIndexable = indexable as SitecoreIndexableItem;
            if (sitecoreIndexable == null) return null;

            // Get the folder item that contains the tabs for the page. Stop work if there isn't one.
            var tabFolder = sitecoreIndexable.Item.Children.Where(i => i.IsDerived(Templates.TabSection.ID)).FirstOrDefault();
            if (tabFolder == null) return null;

            // Get all tab content items
            var tabItems = tabFolder.Axes.GetDescendants().Where(i => i.IsDerived(Templates.HasTabContent.ID));

            // If there are tabs, get to work.
            if (tabItems.Count() > 0)
            {
                // Get the content from the tab items
                var contentToAdd = tabItems.SelectMany(GetItemContent).ToList();

                // If there are no tabs, return null so we don't fill the index with empty strings.
                if (contentToAdd.Count == 0) return null;
                var tabContent = string.Join(" ", contentToAdd);

                // Send it to the index.
                return tabContent;

            }
            else
            {
                // If there are no tabs, return null so we don't fill the index with empty strings.
                return null;
            }
        }


        /// <summary>
        /// Extracts the text content from an item's fields
        /// </summary>
        protected virtual IEnumerable<string> GetItemContent(Item dataSource)
        {
            foreach (Field field in dataSource.Fields)
            {
                // Make sure it's a text field
                if (!IndexOperationsHelper.IsTextField(new SitecoreItemDataField(field))) continue;

                // Get the field values
                string fieldValue = field.Value;
                if (string.IsNullOrEmpty(fieldValue))
                {
                    fieldValue = string.Empty;
                }
                // Strip out HTML, the index doesn't need it.
                fieldValue = Regex.Replace(fieldValue, HTML_TAG_PATTERN, string.Empty);
                if (!string.IsNullOrWhiteSpace(fieldValue)) yield return fieldValue;
            }
        }
        
    }
}

You'll need to publish your project, making sure the dll gets updated and the config file deployed to see it in action.

You can test this is working in a number of ways. You can use the indexing manager to rebuild the sitecore_*_indexes, or you can go to an item and use the Re-Index Tree option in the Sitecore developer ribbon. Don't see the Developer ribbon? Right click on the ribbon, and check the "Developer" checkbox.

Developer Toolbar Indexing Options Screenshot

This will reindex the content item you're viewing, and any children. Now you can look at the index to see if your values are populated. 

I use a tool named "Luke", which is a Java app (yuck) that lets you explore a Lucene index. In Luke, open the index by going to the root of the index's folder in your data directory (in my dev environment, the web index is "C:\inetpub\wwwroot\NTTSecurity\Data\indexes\sitecore_web_index"). Luke is clearly software designed by developers, but it gives you a valuable peek inside the indexes, and is worth becoming familiar with. 

You can look for items in the index that have content in their computed field by doing a search in Luke. Use the "Search" tab, enter your text on the left, and choose your field name from the dropdown on the right.

Finding Our Field's Content In Luke

Once you've found an item that should have content, double click on the record and you can see its details. Scroll to our field (they're alphabetical) and you can view its data with a right click.

Viewing our field's content in Luke

When you see your content in the index, you're ready to move onto the next part of the job: telling the search itself to look at these fields. 

This is done with a little code and a config file entry.

The config file tells Sitecore to look at the code, and the code tells Sitecore to add the field (or fields) you need to the added when a search query is created.

The config entry looks like this:

<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <solutionFramework>
      <indexing>
        <providers>
          <add name="tabSection" type="Sitecore.Feature.TabSection.Indexing.TabSectionIndexContentProvider, Sitecore.Feature.TabSection"
               patch:before="add[@name='fallback']" />
        </providers>
      </indexing>
    </solutionFramework>
  </sitecore>
</configuration>
This tells Sitecore to go look at our TabSectionIndexContentProvider when performing a search. That code looks like this.
    [REPLACE WITH C# CODE HERE][Use CTRL + Enter to break between lines]
namespace Sitecore.Feature.TabSection.Indexing
{
    using System;
    using System.Collections.Generic;
    using System.Linq.Expressions;
    using Sitecore.Foundation.Indexing.Infrastructure;
    using Sitecore.Foundation.Indexing.Models;
    using Sitecore.Foundation.SitecoreExtensions.Repositories;
    using Sitecore.ContentSearch.SearchTypes;
    using Sitecore.Data;

    public class TabSectionIndexContentProvider : IndexContentProviderBase
    {
        // Tell it which content type you're working with
        public override string ContentType => DictionaryRepository.Get("/TabSection/search/contenttype", "TabSection");

        // Let it know which template types have the tab content 
        public override IEnumerable<ID> SupportedTemplates => new[]
        {
            Templates.HasTabContent.ID
        };
        // Add the field to the search query.
        public override Expression<Func<SearchResultItem, bool>> GetQueryPredicate(IQuery query)
        {
            var fieldNames = new[]
            {
                "tabsContent"
            };
            return this.GetFreeTextPredicate(fieldNames, query);
        }

        public override void FormatResult(SearchResultItem item, ISearchResult formattedResult)
        {
            var contentItem = item.GetItem();
        }
    }
}
The important part is where we override the Expression. Note how we add the name of the field we populate elsewhere in the array: "tabsContent". That's the magic.

I've combined my search related configs into a single one because I care about a neat solution: "Feature.TabSection.Indexing.config". It contains both entries, and looks like this. 

<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <solutionFramework>
      <indexing>
        <providers>
          <add name="tabSection" type="Sitecore.Feature.TabSection.Indexing.TabSectionIndexContentProvider, Sitecore.Feature.TabSection"
               patch:before="add[@name='fallback']" />
        </providers>
      </indexing>
    </solutionFramework>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration
          type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <fields hint="raw:AddComputedIndexField">
            <field fieldName="tabsContent" type="Sitecore.Feature.TabSection.Indexing.TabSectionField, Sitecore.Feature.TabSection" storageType="yes"  indexType="tokenized" />
          </fields>
        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

With that deployed into my website, along with the built feature, I now can pull up the content associated with an item's tabs in a site search.

Share:

Archive

Syndication