Big data and machine learning with Sitecore

Big data and machine learning with Sitecore
15th May 2019

Learn the challenges businesses should consider when introducing big data and machine learning to websites

How to get the most from big data and machine learning in Sitecore

What are big data and machine learning?

Big data is the term given to a large volume of structured and unstructured data that enterprises manage on a daily basis. Businesses need to have tools in place to analyse this data to gain insights that can arm them to make better strategic decisions.

Machine learning is the application of artificial intelligence (AI) that enables software to automatically learn and improve from experience without manual instructions. How big data and machine learning work together is revolutionising how organisations approach digital marketing.

Digitally savvy marketers are using machine learning to understand and anticipate what their customers need and how machine learning is providing compelling personalised user experiences. Embarking on a machine learning project is no mean feat and it comes with many challenges that many businesses fail to overcome or decide not to proceed with machine learning completely.

Professor of Psychology and Behavioural Economics, Dan Ariely sums up big data and machine learning as being “like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”. For Sitecore users, this doesn’t need to be the case.

Sitecore Cortex provides the means to implement machine learning in your website. There are 9 considerations businesses should address for a successful implementation.

Defining the problem

Machine learning is all about trying to solve a problem. Before embarking on a project you should have in mind of what you’re setting out to achieve. This problem should also be backed by a digital marketing strategy.

Here’s some examples of the problems that can be solved by machine learning from a digital marketer’s perspective:

  • Content recommendation – based on other user’s past visit data suggest content for the current user.
  • User segmentation – use machine learning to group website users with similar traits or behaviours
  • Predictive analytics – Use machine learning to spot trends and analyse user behaviour

Getting the right people

A big data and machine learning project in Sitecore requires a machine learning model to be created. This requires expertise in data science.

The need for a data scientist role is becoming increasingly popular as more companies try to make sense of the vast amount of data they are collecting and how it can be used in machine learning.

A data scientist will have a background in Maths and programming and should have the experience to understand the problem you are trying to solve and to propose the tools and the algorithms necessary to solve it.  However, they will not be familiar with Sitecore machine learning, so developers, marketers and data scientists need to work together to build the whole solution.

Sitecore have produced a great video on where data scientists fit in with the machine learning solution within Sitecore.

Choosing the right platform

There are several machine learning platforms available and choosing the right one can seem like a daunting task. But the main thing to consider is how the platform will fit with your existing infrastructure. If you’re using Sitecore, then you’re more than likely to be a Windows-based-business. So, it might be more cost effective to stick with Microsoft tools like Microsoft Machine Learning Server and CNTK or if you prefer cloud – Azure Machine Learning Service. If you’re running Linux, Tensor Flow or Keras might be better options and have more suitability for your operating system.

You will also need to keep in mind how your website will integrate with the platform. You should find out whether the platform has an API available that will allow remote training and evaluating.


Getting the right data

For machine learning algorithms to be useful organisations need the right data to initially train a machine learning algorithm and retrain to make accurate predictions. One of the main issues of the cold start problem with machine learning is the lack of useful data available to train a machine learning model. An experienced data scientist should ensure that the data you are collecting is the right type of data and whether the data is representative of what actually happened.

The tools

Once the above challenges have been addressed you can then start to look how Sitecore features and tools can make everything work!

Cortex processing engine

The Sitecore Cortex processing engine, as the name suggests is the brains and controller behind the whole machine learning process. The engine gives users the means to process data at scale from any chosen source, but comes with built-in workers for processing data from Sitecore xDB.  The processing engine runs in its own process as a Windows service but can accept scheduled tasks from any other Sitecore role using the message bus.

Cortex has been designed specifically with big data and machine learning in mind. It allows website contact data to be projected into a tabular format and then sent to Microsoft machine learning server, using built in integration via MrsDeploy or a machine learning platform of your choice.

The processing engine comes with a framework that gives organisations full control over how the data gets processed and stored using custom workers and models. Another benefit is that the framework provides a means to integrate with a machine learning platform

xDB and xConnect

In the past the data in xDB was siloed away and was hard to get at. Since Sitecore xConnect was introduced Sitecore users can collect data from anywhere and store it in xDB via a REST service. The new Sitecore 9 architecture also gives us the ability to scale xDB to have the capability of storing Big Data.

Projection framework

In order to train a machine learning model, the data needs to in a tabular form. But, the data stored in xDB is not in this format. The projection framework allows you to process your xDB data at scale into a tabular format ready for training. The projection takes place in a multi-threaded way and allows the resulting tables to be merged together before training takes place.


Message bus

Finally, the message bus pulls the process all together and acts as a messenger to all parts of the solution and enables communication across boundaries.

You can send an instruction to start training or evaluating from a machine model from any Sitecore role in the platform. This can be used to great effective when a user is interacting with the frontend website and needs to be presented with personalised data based on a prediction from a machine learning model. With the message bus, all this can happen behind the scenes in a scaled and fault tolerant way.

How is your organisation using big data and machine learning?

If you’re ready to start getting more from your Sitecore solution or perhaps you’re considering implementing Sitecore to reach new audiences and create better personalised user experiences – we can help. Get in touch to discuss your requirements or to book a Sitecore demo.