Back
aks additional node pool terraform architecture diagram

Creating an Additional AKS Node Pool with Terraform/OpenTofu (Step-by-Step)

AKS additional node pool Terraform setups are one of the most practical ways to scale and isolate workloads in Azure Kubernetes Service. In this guide, we’ll create a fully functional user node pool and deploy workloads to it using Terraform/OpenTofu.

Azure Kubernetes Service (AKS) becomes dramatically more flexible when you split workloads across multiple node pools. A single system node pool is fine for demos or tiny clusters — but for real architectures, you need dedicated node groups, different VM sizes, isolated workloads, and optional autoscaling.

In this article, we build exactly that:
a separate, dedicated user node pool, fully automated with Terraform/OpenTofu.

We’ll create a node pool with custom labels and taints, generate a Kubernetes manifest that lands only on this pool, and validate everything in Azure Portal.

When building AKS clusters with Infrastructure-as-Code, the AKS additional node pool Terraform workflow becomes essential for isolating workloads and scaling teams independently.

Why an Additional Node Pool?

In many production architectures you want:

  • Different VM sizes (compute-optimized, memory-optimized, GPU nodes)

  • Workload isolation (system pods vs. application pods)

  • Taints to guarantee scheduling boundaries

  • Node selectors or affinity rules to pin workloads to specific pools

  • Separate autoscaling behavior

Terraform makes this repeatable, predictable, and version-controlled.

1. Defining the Additional Node Pool in Terraform

We start with a simple variable representing a list of extra node pools:

variable "additional_node_pools" {
  description = "Additional Node Pool definition"
  default = [
    {
      name                 = "userpool"
      vm_size              = "Standard_D2s_v3"
      node_count           = 2
      mode                 = "User"
      orchestrator_version = null
      subnet_id            = null
      taints               = ["dedicated=user:NoSchedule"]
      labels               = { workload = "apps", sku = "general" }
      max_pods             = 30
      enable_auto_scaling  = false
      min_count            = null
      max_count            = null
      spot                 = false
    }
  ]
}

We pass it into the AKS module:

module "aks" {
  source              = "github.com/foggykitchen/terraform-az-fk-aks"
  name                = "fk-aks-extra"
  resource_group_name = azurerm_resource_group.foggykitchen_rg.name
  location            = azurerm_resource_group.foggykitchen_rg.location

  create_networking   = true
  network_plugin      = "kubenet"

  additional_node_pools = var.additional_node_pools
}

This instructs Terraform to build an AKS cluster plus a second node pool named userpool. This Terraform module exposes a parameter where we pass our AKS additional node pool terraform structure as a complex variable with labels, taints, scheduling rules, and VM sizing.

2. Architecture Overview of the AKS Additional Node Pool (Terraform)

This architecture shows a clean AKS setup extended with a dedicated user node pool.
Here is what you see in the diagram:

  • Virtual network (fk-aks-demo-vnet) defining the IP space for the cluster

  • Subnet for AKS nodes (fk-aks-demo-subnet) hosting both system and user pools

  • Default system pool used only for core Kubernetes components

  • Additional user pool (“userpool”) created via Terraform with its own VM size, labels and taints — ensuring only application workloads land there

  • Optional Azure Container Registry on the left for pulling container images securely

In short: the design illustrates how AKS can be extended with extra compute groups without touching the system pool — a recommended pattern for scalable, production-grade clusters.

The diagram below illustrates how our AKS additional node pool Terraform configuration provisions a separate compute group inside the same cluster.

Figure 1. AKS cluster with an additional node pool

3. Kubernetes Manifest Pinned to the Node Pool

To ensure workloads land only on the new node pool, we include:

  • Tolerations → match the taints from Terraform

  • NodeSelector → match node labels

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-on-userpool
spec:
  replicas: 2
  selector:
    matchLabels: { app: demo }
  template:
    metadata:
      labels: { app: demo }
    spec:
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "user"
          effect: "NoSchedule"
      nodeSelector:
        workload: "apps"
      containers:
        - name: web
          image: nginx:stable
          ports: [{ containerPort: 80 }]

Thanks to this configuration, even if other pools exist, the scheduler will place the pods exclusively on “userpool”.

4. Local Execution: kubectl apply

The Terraform module renders the manifest locally and applies it:

resource "null_resource" "kubectl_apply" {
  depends_on = [
    module.aks,
    local_file.app-on-userpool
  ]

  provisioner "local-exec" {
    command = join(" && ", [
      "az aks get-credentials -g ${azurerm_resource_group.foggykitchen_rg.name} -n ${module.aks.cluster_name} --overwrite-existing",
      "kubectl get nodes -L agentpool,workload,sku",
      "kubectl apply -f ${path.module}/generated/app-on-userpool.yaml",
      "kubectl get pods -o wide"
    ])
  }
}

Terraform configures kubectl, deploys the manifest, and prints pod placement.

5. Validating in Azure Portal

After deployment, we navigate into Azure Portal → AKS → Node pools.

Figure 2. The new node pool (“userpool”) visible in Azure Portal

Everything matches what Terraform defined:

  • Mode: User

  • VM size: Standard_D2s_v3

  • Taint: dedicated=user:NoSchedule

  • Labels: workload=apps, sku=general

  • Node count: 2

In Workloads → Deployments, we find:

  • app-on-userpool running

  • 2 replicas placed on nodes belonging to the userpool

  • Clean scheduling based on node selectors and tolerations

This confirms that our Terraform + YAML combination behaves exactly as expected.

6. Read more about production AKS patterns with Terraform

If you’re following the AKS Terraform series, here are the previous articles:

🔗 AKS Kubenet vs Azure CNI — Networking trade-offs explained with Terraform

Understand how AKS networking choices impact Pod IP addressing, traffic flow, scalability, and what you actually observe in Azure Monitor. This guide explains the real production trade-offs between Kubenet and Azure CNI using Terraform examples.

🔗 AKS + Azure Container Registry with Terraform — Secure image supply chain for production clusters

Learn how to provision Azure Container Registry and integrate it with AKS using Terraform/OpenTofu. This guide covers private image pulls, secure authentication, and the baseline container supply chain for production AKS environments

🔗 Persistent Volumes in AKS with Terraform — The Role of Azure Managed Disks

Understand how AKS provisions persistent storage using the Azure CSI driver and how to automate disk-backed PersistentVolumes with Terraform/OpenTofu. This is the baseline pattern for running stateful workloads on AKS in production.

🔗 Azure Bastion with Terraform — Secure Access to Private AKS Clusters

A hands-on guide to deploying Azure Bastion with Terraform — including the required subnets, NSG rules, and a practical workflow for connecting securely to private AKS nodes. If you’re planning a private AKS cluster, this article explains the exact infrastructure you will need. It also includes screenshots and troubleshooting steps directly from the Azure Portal.

7. What’s Next? Auto-Scaling Node Pools

Adding a fixed-size node pool is just the beginning.

In the next article, we will create an autoscaling node pool, where AKS automatically adds or removes nodes based on cluster load and pod scheduling pressure.

This is one of the most important skills for production-grade AKS operations — and we will build it step by step with Terraform/OpenTofu.

⚡Course: “Azure Kubernetes Service (AKS) with Terraform/OpenTofu — Hands-On Fundamentals (2025 Edition)”

This blog post is part of the AKS course, where we go far deeper into:

  • AKS networking (Kubenet, Azure CNI, Overlay, dual-stack)

  • Node pools (system/user, taints, labels, tolerations, autoscaling)

  • ACR integration and CI/CD workflows

  • Observability and Log Analytics

  • Storage, identity, RBAC, and production-ready architecture patterns

 

azure aks terraform course

Scale and Optimize AKS with Terraform/OpenTofu

Learn how to design and automate advanced AKS node pool strategies — including system vs. user pools, workload isolation, taints & tolerations, autoscaling, spot nodes, and GPU workloads — all provisioned with Terraform/OpenTofu.

🔒 Lifetime • ⏱️ Self-paced • 🧪 Real labs

Check also other courses:​

Leave A Reply

Learn AKS with Terraform/OpenTofu

This hands-on course teaches you how to deploy and manage Azure Kubernetes Service (AKS) using Terraform/OpenTofu — with production-ready networking, node pools, ACR, autoscaling, monitoring, and real automation workflows.

🎓 What you’ll learn:
- AKS networking (Kubenet/CNI/private)
- ACR integration & CI/CD flow
- Node pools, autoscaling, monitoring

azure aks terraform course