Sitecore 9 architecture

Sitecore 9 architecture
19th November 2019
News and Insights

Planning a new deployment with Sitecore involves making a series of important choices. Find out how to get the most from Sitecore 9 architecture.

When you’re planning a new Sitecore deployment, one of the key choices developers have to make, is how many servers are needed, and what each is going to do. That’s the overall architecture of your deployment. It’s not an easy set of decisions to make. A full deployment of Sitecore 9.2 can include 50 roles that all need considering for a home on your servers. We’ve pulled together a list of the key things for organisations to consider when planning a Sitecore 9 architecture.

1. Where can Sitecore 9 architecture be deployed?

Sitecore is flexible about where you choose to deploy it. It’ll run happily on:

  • Physical hardware or virtual servers in your own data centre
  • Physical or virtual servers in a hosting service’s co-location centre
  • Any cloud-based Infrastructure-as-a-Service (IaaS) provider’s resources
  • Microsoft’s Azure Platform-as-a-Service (PaaS) resources

You can mix and match these deployment types as well, if required. It’s not uncommon for Sitecore Azure PaaS to have some IaaS virtual machines deployed in parallel to support code that can’t run on PaaS. Pick the infrastructure that makes most sense for your business and its goals. As long as it can run Windows Server, it should be fine.

2. Store some data

When using Sitecore to manage content and gather analytics data, its essential to give it somewhere to put all that data – which means databases. Sitecore supports Microsoft’s SQL Server if you’re deploying to physical servers or virtual machines. And it supports SQL Azure if you’re going for a platform-as-a-service (PaaS) style of deployment. You can also use MongoDB for collecting analytics data, if that suits your infrastructure better.

There are a variety of different things that get stored in databases:

Content

Unsurprisingly for CMS platforms, there are key databases for content. There are three out of the box options, these are for:

  1. Sitecore’s internal data, which can include user account data as well.
  2. The editors’ view of the website data – which keeps all the history as well as the not-yet-published content.
  3. Storing the current state of your site that’s being served to the internet.

But if you need to be able to publish content to multiple separate locations you may choose to add more databases to this list. There are two different editing platforms on Sitecore, Sitecore Experience Editor vs Sitecore Content Editor.

User registration data

You have the option to split out the data store for your user accounts from the internal content database mentioned above. Typically, this happens when they are using Sitecore to manage a log-in on their public website and want to put this data close to their public web servers.

Session state data

You probably have more than one server for your public site, sat behind a load balancer. That means any data that needs to be available between requests needs to be stored in a central location, so it doesn’t matter which web server stores your request. The session state database(s) deal with that for you. These are likely to be in SQL Server by default, but Sitecore also supports Redis for this data storage role.

Raw analytics data

While people are browsing your site, Sitecore can be collecting data about where they came from and what they’re doing. These databases have the potential to grow quite big over time. So, Sitecore allows you to use MongoDB and its ability to split the data easily over multiple servers, if you prefer that approach to managing large instances of SQL Server. But you can always prune the old data to keep this more manageable. If you make use of the Universal Tracker to capture analytics from places that aren’t Sitecore websites, then that has an extra database to include here.

Analytics reports

In the background, Sitecore processes the raw analytics data into a structure that’s easier for drawing pretty graphs. This aggregated data is smaller than the raw information but is less detailed.

Analytics processing

Managing the processing of all the raw analytics data also involves storing information about what tasks need doing, and which machines will take them on. Sitecore keeps some databases which describe the queue of processing work. Plus, if you make use of Sitecore Cortex’s machine learning framework to enhance your decision making from analytics data, then you have some extra data here too.

 

Internal messaging

Sitecore has moved from the old, “everything runs on a couple of web servers” model of deployment to a modern “cloud friendly” model where there are many smaller roles which communicate. It needs a way for communications to go between the different servers, which is why there is now a database which manages this communication.

  • Email data – If you’re using the Email Experience Manager features for creating and sending personalised email, then Sitecore needs a database to keep that content in.
  • Form data – If you make use of the Experience Forms engine to capture data from your website users, you’ll need a database to remember the data that got submitted.

Your planning for databases also needs to consider where it’s best to put these on your network. Databases like published content, session state and analytics collection are heavily used by your internet-facing web servers, so it makes sense for this data to be stored close to those machines. Whereas the editors’ content database and the reporting data can be better located close to the servers that provide the content management UI inside your business.

If you’re splitting up these servers for network security purposes, you need to consider whether firewalls and distances across networks might affect your performance. You might want to consider having more than one database server.

You also need to consider what level of redundancy and scalability you need for your data. SQL Server and MongoDB both have options for spreading data across servers to enable larger databases, and having more than one database server to try and prevent outages caused by hardware failures. Next you’ll need to decide how you will approach scaling and resilience.

3. Using search indexes to query

The data in your databases is necessary to make your sites work, but it’s not necessarily stored in a format that’s fast to access when your site needs to look through lots of it. Sitecore integrates with a search and indexing framework to speed up searching through the data.

The open-source Sitecore Solr search engine is great if you’re deploying onto VMs or on-premises, and the Azure Search framework is best-suited to platform-as-a-service deployments. Though either option can be used with the other approach to deployment too, if you need to.

Usually the decision-making process comes down to cost and convenience. It’s easy to use Azure Search in a PaaS scenario because Sitecore’s install process can set it up for you. Whatever you choose, there are a collection of search indexes that are required:

  • Content indexes
  • Analytics indexes
  • Marketing and personalisation indexes

How much should developers invest in redundancy and scalability at this point?

Azure services have price-points for different levels of performance and scalability. This allows you to choose how many Sitecore Solr servers you want to deploy if you choose that way.

Like the database servers, you may choose to put search boxes close to your public web servers or your content management servers, depending on the overall structure of your network.

With the 9.2 release, Sitecore now supports having a dedicated server for processing content items and updating search indexes. If you make heavy use of search this may help you optimise performance.

4. Run some code

Historically, Sitecore deployments were “just one big ASP.Net application” on all the servers, with some configuration tweaks to differentiate them. But as the developers work through the software architecture process for moving from that monolith to a set of smaller applications which better suite modern deployments, the number of code roles you can deploy has increased.

It’s important to plan for which roles are relevant to your solution for deployment and decide how you plan to split them between servers. You might want to put multiple roles onto one server (or app service plan) to save money on roles that won’t use much CPU time.

Sitecore describes this set of roles as “Application roles”, and they break down as follows:

  • Content Management

    The editing UI that lets your staff modify and preview content lives in this role. Usually there’s just a few of these, as the load for editing content tends to be low.

  • Content Delivery

    These are responsible for serving your published content to the internet. That means load is very dependent on your website’s needs. It is common to deploy two of these as a minimum so that load balancing can prevent outages due to a single server being offline – but some deployments will have multiple machines in different data centres spread around a country or the globe.

  • Analytics and Reference Data APIs

    There are four roles that deal with capturing analytics data from user sessions and exposing it back to the servers that run the website. There’s a role which exposes the APIs for collecting the raw data and allows it to be read back again later. There’s a role which deals with updating search indexes when this data changes, and there’s one which allows searching the data. There’s also a role for the API that allows managing the reference data mentioned above.

  • Analytics Processing and Reporting

    The raw data captured by the analytics API roles needs some processing in order to aggregate it for reporting. There is a role which manages this task, and updates the analytics reporting database based on changes to the raw data. Secondly, there’s a role which exposes the report APIs that allow the content management servers to render graphs and tables in their UI.

  • Marketing automation

    The processing that goes on behind the scenes to run the marketing automation behaviour is split between two roles. The “operations” role provides the APIs that allow people to be enrolled in plans, and for the plans to do work. The “reporting” role provides the APIs that allow editors to find out how many people are in plans, and what their states are.

  • Cortex

    Similar to analytics, there are processing and reporting roles for Cortex. They deal with the process of scheduling work than needs to be processed by your Machine Learning models, and then returning data about the models and their results. But note, these services are just an interface to the models – you still need a process or service that does the actual “thinking” for this to do anything useful.

  • Publishing Service

    The content management servers include a publishing engine, but it’s not great if you have very large content databases or network latency between your master and web databases. The modern replacement for the old publishing engine is available as an optional extra service that you can install.

  • Universal Tracking

    If you want to expose the APIs that enable easy analytics tracking from things that aren’t a website then the tracking and collection roles for the Universal Tracker provide those services. Much like the web analytics, they expose APIs for recording and querying raw data, as well as a role for processing the raw data into the analytics reports.

  • Identity Service

    As Sitecore breaks more parts out of the old website into separate roles, there needs to be a single-sign-on mechanism that means logging into one lets you access UI on all of them. Identity Service provides this UI and API, by integrating with other authentication providers and issuing standardised “tokens” which can grant access to multiple roles.

Moving forward with Sitecore 9 architecture

Sitecore can scale from small developer instances of sites (where everything runs in one place) through to large distributed deployments that span continents. That means planning which roles go on specific servers, where the servers live and what hardware resources, they have is crucial to a good deployment. The technical aspects differ if you deploy to a PaaS rather than to individual servers (whether virtual or physical) but the basic decisions are the same.

If you need help understanding how best to deploy your site, get in touch with our globally recognised Sitecore experts.

At Kagool we have extensive experience with deployments from laptop-scale developer instances, to production sites that span multiple continents. We’re a long-standing Sitecore Platinum Partner, developing our own methods for using the Sitecore 9 architecture to create award-winning websites.