Optimizing Kubernetes on Azure: A Comprehensive Guide

When running a production-ready Kubernetes environment on Azure, it’s essential to focus on cost optimization to ensure efficient resource utilization and minimize operational expenses (OPEX). This guide will walk you through various strategies and best practices to optimize your Kubernetes clusters on Azure, drawing insights from the provided content and our own expertise.

Cluster Autoscaler: Dynamically Scaling Nodes

The cluster autoscaler (CA) is a powerful tool for managing the scaling of nodes in an AKS cluster. By defining a minimum and maximum number of nodes, the CA can dynamically adjust the cluster size based on the workload demands. The CA checks for pending pods or empty nodes every 10 seconds and scales the cluster accordingly.

When the traffic is minimal to normal, you can maintain a minimum number of nodes to keep the cluster and your workloads operational. As the traffic increases, the CA can scale up the cluster to a maximum number of nodes to handle the increased load. Conversely, when the workload decreases, the CA will remove unneeded nodes after they have been idle for more than 10 minutes, helping to reduce the overall cost of the cluster.

Horizontal Pod Autoscaler: Scaling Application Instances

The horizontal pod autoscaler (HPA) complements the cluster autoscaler by automatically scaling the number of application instances (pods) based on metrics like CPU, memory, or network utilization. The HPA operates by scaling the number of replicas between a defined minimum and maximum range, ensuring that your application can handle increased load without over-provisioning resources.

By enabling both the CA and HPA in your AKS cluster, you can achieve a well-balanced and cost-effective scaling strategy. The HPA handles short-term spikes in demand by scaling out the application instances, while the CA provisions additional nodes to handle the increased workload over a longer period.

Azure Container Instances Connector: Bursting to the Cloud

Another option for scaling your applications is the Azure Container Instances (ACI) connector. The ACI connector, implemented as a virtual Kubelet in your AKS cluster, represents a virtual agent node with virtually unlimited capacity based on your Azure subscription limits. This allows your AKS cluster to quickly spin up new pods in ACI to handle bursts of traffic, without the need to provision additional nodes in the cluster.

The ACI connector can be a useful complement to the cluster autoscaler, as it allows your application to scale out more rapidly than waiting for the CA to provision new nodes. This can be particularly beneficial for applications that experience sudden, high-volume spikes in demand.

Selecting Optimal Node VM Sizes

The choice of virtual machine (VM) size for your AKS node pool can have a significant impact on the operational cost of your Kubernetes environment. The general recommendation for production clusters is to use the Dsv3 or Esv3 VM series, which are SSD-backed and offer a balanced CPU-to-memory ratio.

For most workloads, the Standard_D2s_v3 (2 vCPUs, 8 GB memory) or Standard_D4s_v3 (4 vCPUs, 16 GB memory) VM sizes are suitable. If your application requires a higher CPU-to-memory ratio, consider the Standard_E2s_v3 (2 vCPUs, 16 GB memory) or Standard_E4s_v3 (4 vCPUs, 32 GB memory) VM sizes.

For development and testing environments, you can use the more cost-effective Standard_B2ms, Standard_B4ms, or Standard_B8ms VM sizes, which are part of the burstable B-series.

It’s also valuable to engage with the application developers to understand the specific resource requirements (GPU, CPU, memory, IOPS) for the workloads that will run on the AKS cluster. This information can help you select the appropriate VM sizes to match the application needs.

Conclusion

By leveraging the cluster autoscaler, horizontal pod autoscaler, Azure Container Instances connector, and selecting the optimal VM sizes, you can significantly optimize the cost of running Kubernetes on Azure. These strategies allow you to dynamically scale your cluster and application instances to match the demands of your workloads, while minimizing resource waste and operational expenses.

For more information, please refer to the following resources:

The content for this article was sourced from Guidelines for cost optimization - Kubernetes on Azure.