Sitecore Solr

Sitecore Solr
24th July 2019

Find out how enterprises can leverage Sitecore Solr and why indexing is crucial to your business’ website

Sitecore Solr is an indexing technology. Sitecore supports two search engines, Lucene and Solr which are used to search Sitecore’s content and operational databases. But how does Sitecore Solr work and why should organisations bother using it?

Most developers in the Sitecore community are pretty clued up on the differences between Solr and Lucene and their benefits, but there’s little information available for a non-technical audience – until now.

Understanding indexing

Firstly, to understand Sitecore Solr, you’ll need to know what is meant by an ‘indexing technology’.

Sitecore Solr is not dissimilar to a library’s indexing system. Indexes dictate how library books are stored, without this order finding a book would be chaos. Sure, you would still be able to find the books eventually, but probably not the books you want or quickly. Most libraries have a consistent structure; a high-level grouping of the subject, that are ordered alphabetically. This is designed so that anyone can walk into a library and go to the section they want and find a book.

When library visitors have trouble finding a book, they can usually use a basic indexing database that stores all the key information about every book available that people regularly search by:

  • Title
  • Themes
  • Author
  • ISBN
  • Edition
  • Publish date

The index doesn’t store a full copy of the contents of the book, but it will have enough information to help you find the book you want quickly. A library wouldn’t function very well without this index, or if the index was slow to search or inaccurate.

The content of your Sitecore website is not dissimilar to the library. You probably have the content neatly divided into sections and have intuitive navigation tools to help your general website user find the content they want. But what if your user wants to find all pages that mention a specific term or phrase? Without an Index this would be equivalent to going through every page of every book in the library. Not only would that take a long time, you’d end up exerting a fair amount of physical effort covering every corner of the library and opening every book.

Sitecore Solr is the indexing technology that helps businesses overcome this. Developers can write code to go through every Item in your Sitecore database (xDB) and find content by brute force, but that would be a poor user experience. One search would take a lot of computing resources. For every additional user that searches, the use of resources would grow exponentially, and your website would likely crash quickly.

Indexes are important to the operation of enterprise CMS platforms like Sitecore. Not just for your end users, but also the team that manages content and builds pages. It’s important for allowing efficient searching and use of the analytics data within Sitecore xDB.

Indexes and Sitecore

Indexes have been a key part of Sitecore for as long as we can remember. In the early days, there was only one option for an indexing technology – Lucene. As Sitecore became more complex, Lucene started to present some challenges and so as of later revisions of Sitecore 6 it became possible to use Sitecore Solr as an alternative.

Solr support is a huge feature in Sitecore. If implemented and used properly it can enable your platform to scale efficiently whilst helping to ensure lightning-quick load times for your pages. To understand why, you need to know a little about the differences between Lucene and Solr.

Lucene vs Sitecore Solr

The most obvious difference between Lucene and Solr is the location of the indexes. Lucene indexes exist on each server that needs to read from the index, whilst Solr runs on its own machines and all servers share a single copy of the index. As Sitecore has evolved, its architecture has meant that it runs on more and more servers, with each server being more specialised to suit its function.

This fact is important for two main reasons:

1. Creating an index uses up a lot of resources.

Reading from an index also uses a substantial amount of processing power. If the index is being created on a server that is also performing other important tasks, then there are less resources available to that server to perform its main functions

2. Indexes are updated frequently.

In a system like Sitecore this happens as a result of messages being sent between servers to say that an index needs updating. When the index needs updating, it needs to read content from the databases. This means it is very hard for this all to happen simultaneously on multiple servers, so we introduce the possibility of inconsistencies. Not too problematic for your website content, but potentially critical to indexes of user behaviour that is used to power Sitecore personalisation.

If you run your entire Sitecore instance from one server, firstly, you probably shouldn’t be using Sitecore, but secondly, you can use Lucene with no problems.

But if you have more than one server in your production environment, then you definitely need to use a “centralised” indexing technology like Solr or Azure Search.

Solr will run on one or more servers, but these servers are dedicated to Solr only and they will serve all other servers in your environment with indexing functionality. We can see that Sitecore Solr instantly solves the two problems above. Azure Search also solves these problems, but it is a newer technology than Solr, less established and only really of benefit in a Platform as a Service (PaaS) cloud hosting model. Solr has the advantage of working in almost any scenario.

How to configure Sitecore Solr

One of Sitecore’s main strengths is how flexible it is when it comes to configuration. If you look at any CMS comparisons, Sitecore is leagues ahead of its competitors on this aspect because enterprises can run it on one server or on multiple servers distributed all over the globe. You can deploy it on internal infrastructure, on virtual machines or on PaaS components in the cloud.

Sitecore Solr is equally flexible. The recommendation is to have Solr running on multiple servers. This is because its critical to the website, and so if it were to run on one server, that presents an unacceptable risk – a single point of failure. We recommend running Solr on multiple servers (normally in odd numbers, 3, 5, 7 etc.) It comes with multiple scaling options and can be used in conjunction with a technology called Zoo-keeper that manages the consistency across all Solr servers, in a cluster.

Is your organisation using Sitecore Solr?

Just like most Sitecore features, Solr is incredibly powerful. Its power and flexibility in configuration results in great options for your organisation. Working with a Sitecore Partner with expertise in installing and configuring Sitecore Solr is paramount if you want to get the right configuration for your business.

When it comes to Sitecore, we’re proud to be top of the class. We have a wealth of expertise and experience at our disposal that our clients leverage. Including some incredible work from one of our Sitecore MVP’s in creating scripts to automate Solr installations which were so highly regarded they’re now integrated into Sitecore installation tools.

Want to know more about how we can help you get the most out of Sitecore? Contact us today to have a chat with one of our experts – we’d love to help.