PROCESSORS - pantheon-systems/search_api_pantheon GitHub Wiki

Home | Best Practices | Fun With Indices | Installation | Jargon | Local Development | Processors | Troubleshooting Indices

Processors are used by Drupal to alter the index and to deal with Drupal's data ideosyncracies. As such there are many processors which are not necessary because the server itself solves the problem in a more graceful way, leaving the drupal server free to worry about other things.

Processors marked with "IT IS RECOMMENDED NOT TO USE THIS PROCESSOR WITH THE SELECTED SERVER" should not be used because the solr server has a better way of handling the problem.

Boost more recent dates
In order to "boost more recent dates" you have to have an indexed "date" field in your list of Solr Fields. Add the "authored on/created" field in the "fields" tab and be sure the "type" of the field is set to "date". Once you go back to the processors tab, you will see that that field is now in the "processor settings" area at the bottom of the page and you may choose the amount you can "boost" it. A good place to start is "1" meaning newer dated articles will get counted only SLIGHTLY more heavily than older ones. Increase the boost to fit your search needs.

Content Access

Many websites have content that is restricted to specific users or user roles. If you have a site that is browsed by primarily anonymous users and all your content is freely available to anyone, you probably don't need this processor.

Double Quote Workaround

Only use this if you're having issues with double quotes in your search results. Make sure you limit its function to search fields that will actually have quotes in them. don't use it on URL or tag fields.

Empty Status

Most of these processors exist to fix a problem with the toxic interaction of drupal modules producing an undesired effect on the search results. In this case, the "workflows" module has produced content that has been flagged as "published" but in fact does not have a entity.status=true published status in drupal. If you don't use workflows or any way other than the content "published" checkbox, you probably don't need this processor.

Highlight

This processor puts <strong> tags around the portion of the search result that mandated it's inclusion in the query results.

HTML Filter

This filters out irrelevant <div> and <span> tags so if you search for the word "span" your results will be correctly relevated.

Ignore Case

Ignore case should not be used with Solr8. All solr 8 drupal indices are case insensitive by default.

Ignore Characters

Ignore Characters allows you a regex field to completely ignore any characters in the index. You generatlly would use this for things like Emjoi and to weed out spurious characters from platform-specifc formats like Microsoft Word.

Number Field-based boostring

You would use Number field boosting in situations where you have your users "rating" content and how relevant it is/was to the search and then boost the relevancy score accordingly.

Regular expression based replacements

Exactly what it sounds like it is. You supply a regular expression to change content before it's shown to the user.

Reverse Entity References

Useful when you're indexing entity reference fields and you want to include the referenced entity with the content-indexed entity. For instance if you have an attached metadata file for music that contains the songwriter and various recordings and yhou want that information indexed along with the lyrics.

Role-based access

Access check based on drupal roles. It's for exluding role-based content from search results unless you're logged in. Useful when you're keeping iformation behind a privilege wall.

Stemmer

Read the documentation on Stemming search terms. Most of this is done by alternatives files in Solr 8. See the example in search_api_pantheon/modules/search_api_pantheon_example.module

Stopwords

Use a hook to change the stopwords_{LANG}.txt file. There's an example in the search_api_pantheon/modules/search_api_pantheon_example.module.

IMPORTANT: If you only have one language available on your drupal site and you don't have translation enabled, all title fields will be language_und so when searching, it will use stopwords_und.txt It's probably a good idea to make sure the stopwords_{default_lang}.txt and stopwords_und.txt have similar content.

Tokenizer

Use a hook to change the TOKENS file before its sent to the server. See the example in search_api_pantheon/modules/search_api_pantheon_example.module

Transliteration

Use A hook to change the ACCENTS.txt and PROTOWORDS.txt before it's sent to the server. See the example in search_api_pantheon/modules/search_api_pantheon_example.module

Type-specific boosting

Boost indexed items based on their datasource and/or bundle id/name.

⚠️ **GitHub.com Fallback** ⚠️