Kubernetes is most famous as a tool for deploying containers. But in its current form, Kubernetes is eminently capable of deploying databases, too—including those with high-availability needs.
Let’s explore how in this article. We’ll start by discussing what exactly high availability means, then explore how to use Kubernetes’ StatefulSets API to deploy stateful applications.
Defining High Availability: The CAP Theorem
For years, the famous CAP theorem (defined by Dr. Eric Brewer in 2002) has influenced the way that Database Management Systems have been designed. The classical interpretation of this theorem states that when you design a distributed system, you have to select two of three characteristics: consistency, availability, and partition tolerance.
There has been much discussion of this interpretation. Some important computer scientists have extended the theorem to include latency and conclude that “No matter how a DDBS replicates data, clearly it must trade off consistency and latency.” For his part, Dr. Brewer has written about the impact of the CAP and the way that NoSQL databases choose to focus on availability first and consistency second.
But what if we choose the ton of features, market options, reliability and community support of a database that was designed first to adhere to the ACID properties (atomicity, consistency, isolation, and durability) instead of a NoSQL option? We have several alternatives to keep a good balance between consistency and high availability. One of the most interesting is the use of containers and orchestration, but a complex database (such as PostgreSQL) requires care, and you need to be aware of some concepts before you get your hands dirty.
StatefulSets and Databases
Since version 1.9 of Kubernetes (released in January 2018), the StatefulSets (APIs that manage the deployment and scaling of a set of pods, but also guarantee the order and uniqueness of these pods) have been available in stable form. They make it possible to deploy stateful applications to be deployed in a Kubernetes cluster.
Transactional databases rely on the state of the data, and as opposite to immutable services (like web applications or web servers). The databases differ from their upstream images at the very beginning of the deployment cycle, and this condition increases over time as the services run.
Typically, in a transactional database, the state of a database includes not only the data itself but also the following:
- The role of the instance in the cluster.
- The Identity of the external connections.
- The sessions opened.
StatefulSets provide the perfect solution for deploying transactional databases, but as the documentation warns, there are some limitations that have to be addressed before using them:
- The storage for a given pod must either be provisioned by a persistent volume provisioner, or pre-provisioned by an admin.
- Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet.
- You need to create a service that administers the network identity of the pods.
For specific examples of database deployment with Kubernetes, check out the following resources:
- Setting up MySQL Replication Clusters in Kubernetes (2017)
- Configure a SQL Server container in Kubernetes for high availability (2017)
- Deploying PostgreSQL Clusters using Kubernetes StatefulSets (2018)
- PostgreSQL Kubernetes: How to run HA Postgres on Kubernetes (2018)
Traditional transactional databases were designed to provide a compromise between consistency, availability and partition tolerance, in keeping with the philosophy behind the CAP theorem discussed above. Yet the scalability challenges introduced by the trend toward NoSQL databases have set new standards for availability.
Kubernetes and containers are an excellent way to meet those challenges. Despite the fact that “stateless” was once the operative word associated with containers, that is no longer the case, and Kubernetes today can be used to deploy databases just as effectively as stateless applications.