Capacity planning is a critical step in successfully building and deploying a stable and cost-effective infrastructure. The need for proper resource planning is amplified within a Kubernetes cluster, as it does hard checks and will kill and move workloads around without hesitation and based on nothing but current resource usage.
This article will highlight areas that are important to consider, such as: how many DaemonSets are deployed, if a service mesh is involved, and if quotas are being actively used. Focusing on these areas when capacity planning makes it much easier to calculate the minimum requirements for a cluster that will allow everything to run.
Tuning beyond the minimums is best determined by application performance profiling based on projected usage; but that is a whole other topic that includes everything from using benchmarking as a service to load testing tools to watching real trends with application performance management suites.
Where there is Kubernetes, there is etcd
At the core of almost every Kubernetes deployment is an etcd cluster. In many distributions, this starts as a three-node etcd cluster that is co-located on the management nodes. It can often grow into a five-node dedicated cluster once the Kubernetes cluster starts to see any substantial load. The largest etcd clusters will become constrained by CPU limitations, but for most moderate-sized clusters, two or four CPU cores are enough and disk I/O will be far more critical; therefore, making sure the fastest disks available are used.
When dealing with the largest cloud providers, IOPS increases with the size of the disk provisioned; so often, you will need far more disk space provisioned than required to meet the performance numbers required.
To start to plan the capacity required to run your cluster, getting etcd taken care of is critical. To start off with a small etcd cluster serving under 200 Kubernetes nodes will be three servers with two cores each, 8GB of RAM, 20GB of disk space per node, and greater than 3000 concurrent IOPS.
More details on how to plan for the capacity requirements for etcd are available in the official ops guide on etcd.io.
Kubernetes Management Nodes (Control Plane)
The management nodes that are deployed as part of any Kubernetes cluster are not the most resource-intensive applications you will ever run because a lot of their performance and capacity requirements are tied to their underpinning etcd instance. This is why a lot of public clouds charge a nominal fee, or no fee at all, for the nodes.
To be highly available, a minimum of two management nodes are required. But since many smaller clusters have the underpinning etcd cluster co-located on the management nodes, the control plane becomes a three-node cluster. You could have a single node for both etcd and the management control plane, but that increases risk, as all management capabilities will be unavailable as you recover the single node. At one point, Google Kubernetes Engine (GKE) used this configuration for etcd when the cluster was under 500 nodes.
On small clusters with under 100 nodes, the control plane will fit into the unused capacity from the etcd cluster as long as the nodes have four CPU cores and 16GB of memory, which is what is commonly recommended by all major distributions.
As a cluster grows, the number of deployments that need to be managed will increase, as will the number of incoming requests. While each of the manager functions only use a few hundred MB of memory, as the number of connections grows, there is also a need to scale the number of apiserver instances. They need to scale faster than other components to avoid becoming a bottleneck for all communications. Each instance can only handle a set number of requests at a time which defaults to 400.
Kubernetes Worker Nodes (Memory and Storage vs CPU cores)
The actual deployed applications are running in pods on the worker nodes. This includes DaemonSets, which run on every node and regular applications that have a set number of replicas that can be configured to dynamically change with autoscaling. For the sake of simplicity, though, we will leave autoscaling out of the basic planning since that is just an additive to whatever capacity plans are created.
With this scheduling algorithm, it is important to remember that CPU is compressible, Unless it is reserved by a quota, Kubernetes will just keep piling things onto the same nodes, regardless if CPU is overcommitted or not.
Memory and storage are considered incompressible; so if the space is not available, the scheduling will fail on that node or storage device. Even the autoscaler functionality in Kubernetes uses memory as its default metric to know when to add additional pods or even nodes. There are two types of quotas on a pod: the requested amount, which is the minimum; and the limit, which is the maximum. If a pod asks for memory above its request but below its limit, the scheduler will grant the request.
Kubernetes schedules applications across worker nodes in a round-robin fashion. If the requested memory size isn’t available, it will fail to schedule; then the scheduler will move to the next node and continue until it finds a host with the available resources. The same logic will apply if quotas are in use for CPU and there aren’t enough uncommitted CPU cycles.
An additional consideration is that the default number of pods that can run on a single host is 100. In some instances it can be increased to 250, but in other instances, like AKS, it is limited to 30 pods on a host. This metric will make a big difference in what types of compute instances you can use, which will impact calculations for everything.
For example, if you calculated 3000 pods in an AKS cluster and you’ll also be running Fluent, Prometheus, and Dynatrace as DaemonSets, then you’ll have a working limit of 27 pods on each node. This leads to the requirement of 112 nodes at no extra capacity, just based on the per-node limit, and not counting CPU usage or physical memory limitations.
When capacity planning for the Kubernetes worker nodes, after you know your per-node limit, calculate the amount of memory you will need per node. Ideally, everything will have quotas set, which will allow for exact calculations. There are multiple ways to set default resource limits, but for our purposes here, we will assume the average pod will have a requested memory of 256Mb. So, the same 3000 pods will require 750GB of total memory across all the nodes, plus the 1GB of base per-node memory already required (kubelet, plus the three DaemonSets listed above). This means on AKS, each of the 112 nodes would need 8GB of RAM at the absolute minimum. But GKE or OpenShift (with 100 pods per node as the default limit) would have 31 nodes with 26GB of RAM minimum.
While capacity management is often its own specialization in the world of IT planning, once the base levels for the clusters your organization will deploy have been established, maintaining the most efficiently-sized clusters is not as difficult as it is with many other platforms. Due to the nature of containers and how Kubernetes was architected for scaling from ‘Day One,’ it only takes some straightforward math to calculate how much to increase or decrease workloads to maintain the most efficient use of resources. This also minimizes wasted compute time, which can save your organization a lot of money over time, as the public cloud can be expensive if you aren’t careful.