Main Image

Part 3 – MongoDB and considerations for hosting it in production

In this series of blog posts I’ll talk about how we approach Sitecore® Experience Platform™ 8 production hosting at Cucumber. As this is a large topic I’ve decided to split it into bite-sized chunks for easier digestion.

Part 1 - Production Hosting
This post outlines the kinds of things that you should be considering when planning your production deployment of the Sitecore® Experience Platform™

Part 2 - Sitecore 7.x
A look back at how things were done before the release of version 8 and the problems that version 8 addresses.

Part 3 - MongoDB
An introduction to the MongoDB NoSQL database and some of the things you should know about hosting it in production.

Part 4 - Sitecore 8.x Standalone
A description of the minimal infrastructure required to host the Sitecore® Experience Platform™ in production based on Sitecore best practices.

Part 5 - Sitecore 8.x High Availability
This post will show an approach to achieving high availability with the Sitecore® Experience Platform™.

Part 6 - Solr
An introduction to Solr, an open source enterprise search platform used by the Sitecore® Experience Platform™.

Part 7 - Sitecore 8.x xDB Cloud
Sitecore’s xDB cloud edition introduction and current status.

Part 8 - Sitecore 8.x Hybrid Architecture
Designing a hybrid architecture using SaaS offerings.

Part 9 - Sitecore 8.x How to Choose an Architecture
The Cucumber approach to designing Sitecore® Experience Platform™ 8 production infrastructure.

Introduction
In the previous blog post I took a look back at some of the reasons why the Sitecore architecture changed between versions 7.x and 8.x. MongoDB (from humongous) is now a requirement of the Sitecore® Experience Platform™ 8 so in this post I’ll explain what it is and some of the considerations for hosting it in production.

MongoDB
MongoDB is a technology that the average Sitecore customer is probably not that familiar with (being a relatively newcomer and coming from outside the traditional ASP.Net technology stack). It’s a document database which is one of the types of NoSQL (originally referring to "non SQL" or "non relational") databases along with key-value, column and graph. These types of databases have been around since the 1960s, but have gained popularity of their use by companies such as Google, Amazon and Facebook due the massive amounts of data they collect and process, which goes beyond what is reasonably possible with a relational data model. NoSQL database use alternative data models and compromise on features found in traditional RDBMS in order to meet this challenge.

The CAP Theorem
The CAP theorem, published by Eric Brewer in 2000 defines a set of requirements that describe a distributed system.

  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it succeeded or failed)
  • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)

It’s theoretically impossible that a distributed system can meet all of these requirements. Instead they must focus on a maximum of two. Typical relational database such as SQL Server are consistent (ACID - atomicity, consistency, isolation, and durability) and available meaning they can be considered CA (consistency and availability). In general MongoDB is considered to be CP (consistency and partition tolerance). In contrast to traditional RDBMS it’s designed to store large amounts of data on commodity hardware through horizontal scaling - which I talked about in the first blog post in this series. Additionally, it’s capable of achieving high write speeds, particularly in newer versions. It’s these characteristics which make it a good fit for use as the Sitecore® Experience Platform™analytics data store.

Some description

Data Model
Instead of the more familiar table based model found in SQL Server, data inside MongoDB is stored as JSON-like documents and is schema-less. Internally, data is stored as BSON (binary JSON) which is a binary encoding of JSON like documents with similar characteristics to Google’s Protocol Buffers.

The schema-less nature makes integration with certain types of applications simpler, avoiding the need to map row based relational data structures to object structures. However, there have recently been some people saying that the idea of schema-less systems is actually a myth and these systems simply move the schema definition to other areas of the code making it harder to enforce.

High Availability
High Availability (HA) in MongoDB is achieved through the use of replica-sets, which are recommended by MongoDB inc. for production installations. Replica-sets have three types of members (see the official documentation for the full details):

  • Primaries
    The primary is the only member in the replica set that receives write operations.

Some description

  • Secondaries
    A secondary maintains a copy of the primary’s data set. To replicate data, a secondary applies operations from the primary’s oplog to its own data set in an asynchronous process. A replica set can have one or more secondaries.

Some description

  • Arbiters
    An arbiter does not have a copy of the data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary. Arbiters always have exactly 1 vote election, and thus allow replica sets to have an uneven number of members, without the overhead of a member that replicates data.

The minimum number of members in a replica-set is three, so this increases the number of servers required to achieve HA than previous versions of Sitecore. This could mean an increase in the number of Windows Server licenses required or you could instead opt to run MongoDB on Linux. This would require additional skills to configure and manage but it is arguably a better solution from a performance perspective, with some going as far as to say that it’s a “Very Bad Idea™ to run a production MongoDB on a Windows System”.

Production Considerations
MongoDB is easy to get started for development but harder to manage in production, which like any database requires capacity planning, tuning, monitoring and maintenance. This article on highscalbility.com does a good job explaining it. You should run through the production checklist on the MongoDB website to ensure your installation is configured correctly.

Scaling
MongoDB supports large datasets through the use of sharding. As I outlined before, vertical scaling has a practical limit because it relies on increasing the CPU, storage and RAM of a single machine. In contrast, sharding (horizontal scaling) is a method of splitting the data across multiple machines.

Backups
There are four main methods that can be used to backup your MongoDB data:

  • Backup by Copying Underlying Data Files
    You can make point-in-time backups of your data by using snapshots if your volume manager supports them, as is the case with EBS on AWS and the LVM Manager in Linux. If this isn’t possible standard tools such as rsync and cp can be used but all writes must be stopped before the backup is taken.
  • Backup a Database with mongodump
    mongodump is a command-line program provided with MongoDB that is similar to msqldump. It dumps a database or collection out to disk which can later be restored using the corresponding mongorestore program.
  • MongoDB Cloud Manager Backup
    This is a commercial, fully managed, hosted service from MongoDB inc. that provides a backup feature.
  • Ops Manager Backup Software
    Ops Manager is an on premise solution that has similar functionality to the Cloud Manager version and is available with Enterprise Advanced subscriptions.

Monitoring
MongoDB comes with a set of utilities that provide real-time reporting of database activities. The Cloud Manager and Ops Manager I mentioned previously also provide monitoring functionality. Monitoring using tools such as New Relic and App Dynamics is also supported.

Conclusion
In this post we took a look at MongoDB which is now a component of Sitecore® Experience Platform™ 8. We looked at it’s characteristics and some of the considerations that should be taken into account when planning a production installation. In the next post I’ll introduce the simplest way to install Sitecore® Experience Platform™ 8.