Terraform and Kubernetes: Where to Draw the Line

There’s a question that comes up in a lot of infrastructure teams sooner or later: should we manage our Kubernetes resources with Terraform?

I’ve quite some time thinking about this….and I would say no.

I get the appeal. Terraform is excellent at what it does. It tracks state, it handles dependencies, and it gives you a single declarative workflow for provisioning cloud resources.

When you discover that there’s a Kubernetes provider for Terraform, it feels natural to extend that workflow into your cluster. One tool, one pipeline, one state file, what could possible go wrong?

I’m also aware that the reverse exists: e.g. Google’s Config Connector lets you manage cloud resources from within Kubernetes using custom resources. It’s a clever idea, but it suffers from the same fundamental problem in the opposite direction. Kubernetes is not the right tool for managing VPCs, IAM policies, and DNS zones. Just because you can doesn’t mean you should.

Both of these approaches blur a line that mature infrastructure teams should keep sharp.

The Line: Infrastructure vs. Application

The principle is straightforward:

  • Terraform owns infrastructure. Everything that exists to support your workloads: cloud accounts, networking, firewall rules, managed databases, IAM roles, the Kubernetes cluster itself, and its node pools.
  • Kubernetes owns the application layer. Everything that runs on top of the cluster: Deployments, Services, ConfigMaps, Ingress rules, CRDs, operators, and Helm releases.

The cluster nodes are the boundary. Terraform provisions the machine. Kubernetes decides what runs on it.

This isn’t arbitrary. It reflects a real difference in how these two layers behave.

Why the Layers Don’t Mix

Infrastructure and application workloads have fundamentally different characteristics, and trying to manage them with the same tool creates friction everywhere.

They change at different speeds. A VPC might change once a quarter. A Deployment might change ten times a day. Forcing both through the same Terraform plan/apply cycle means either your infrastructure pipeline runs far too often, or your application deployments are far too slow.

They have different failure modes. When a Terraform apply fails halfway through creating a subnet, you fix the config and re-apply. When a Kubernetes Deployment fails, you need rollback, health checks, and progressive delivery…none of which Terraform provides.

They have different owners. Platform teams manage infrastructure. Application teams manage workloads. These teams have different permissions, different risk tolerances, and different release cadences. A shared Terraform pipeline becomes an organizational bottleneck.

Kubernetes fights Terraform for control. Kubernetes is a reconciliation engine. Controllers, operators, admission webhooks, and the scheduler all mutate resources constantly. Terraform expects to be the sole author of state. Every terraform plan shows drift that isn’t real drift, it’s Kubernetes doing its job. It’s a fundamental mismatch between the two systems.

CRDs are a nightmare in Terraform. The Kubernetes provider’s kubernetes_manifest resource exists specifically to handle arbitrary custom resources. In practice, it’s fragile. Server-side apply conflicts, plan-time unknowns, and schema validation issues make it unreliable for anything beyond the simplest cases. If you’re running Istio, Argo, Crossplane, or any operator-heavy stack, you’ll feel this pain immediately.

What Terraform Should Own

Use Terraform for everything up to the cluster itself:

  • Cloud provider accounts and organization structure
  • Networking: VPCs, subnets, peering, VPNs, NAT gateways
  • DNS zones and base records
  • IAM roles, policies, and service accounts
  • Managed services: RDS, ElastiCache, S3 buckets, Pub/Sub topics
  • The Kubernetes cluster and its node pools
  • Cluster-level authentication and OIDC configuration

You can also reasonably use Terraform for a thin layer of “cluster bootstrap” resources that are tightly coupled to infrastructure decisions and rarely change: namespaces, RBAC policies, ResourceQuotas, NetworkPolicies, and PriorityClasses. These are platform concerns, not application concerns, and they change at infrastructure speed.

What Kubernetes Should Own

Everything that runs inside the cluster belongs to Kubernetes-native tooling:

  • Deployments, StatefulSets, DaemonSets, Jobs
  • Services, Ingress, and Gateway API resources
  • ConfigMaps and Secrets
  • Helm releases
  • Custom resources (Istio VirtualServices, Argo Rollouts, Cert-Manager Certificates)
  • Operators and their managed resources
  • HorizontalPodAutoscalers and PodDisruptionBudgets

For this layer, use a GitOps tool like ArgoCD or Flux.

The Handoff

The cleanest pattern looks like this:

  1. Terraform provisions the cluster and outputs connection details (endpoint, CA cert, OIDC issuer).
  2. Terraform optionally bootstraps the GitOps controller itself (installing ArgoCD or Flux via Helm) — this is the one place where helm_release in Terraform makes sense, because you need something in the cluster before GitOps can take over.
  3. The GitOps controller takes ownership of everything else from that point forward.

This gives you a clear, auditable separation. Terraform state contains infrastructure. Git repositories contain application manifests. Each tool manages what it was designed to manage.

Config Connector and Crossplane: The Anti-Pattern in the Other Direction

I should mention Config Connector and Crossplane, because they represent the same mistake in reverse. Instead of managing Kubernetes from Terraform, they manage cloud infrastructure from Kubernetes using CRDs.

This is, in my view, a heavy anti-pattern: It crosses the line in the wrong direction.

Your cloud infrastructure predates your cluster. It outlives your cluster. Managing your VPC from inside a cluster that runs on that VPC is a circular dependency. If the cluster goes down, your infrastructure management plane goes with it. If you need to rebuild the cluster, you first need the infrastructure that the cluster was supposed to be managing. You’ve created a chicken-and-egg problem where none needed to exist.

Beyond the architectural issues, you’re also giving up Terraform’s mature ecosystem: state locking, plan/apply workflows, policy enforcement with Sentinel or OPA, and a battle-tested provider ecosystem covering every major cloud. In exchange, you get a Kubernetes operator that wraps cloud APIs in CRDs…adding a layer of abstraction and a new set of failure modes without solving a problem that Terraform hadn’t already solved better.

The argument for Crossplane is usually “our team knows Kubernetes, so let’s do everything in Kubernetes.” But familiarity with a tool is not a reason to use it for the wrong job. Kubernetes is an excellent container orchestrator. It is not a cloud resource manager. Terraform is.

Conclusion

The rule is simple: infrastructure belongs to Terraform, application workloads belong to Kubernetes. The cluster boundary is the line. Resist the temptation to reach across it in either direction.

This isn’t about tool loyalty. It’s about respecting the fact that infrastructure and applications are different things with different lifecycles, different owners, and different operational needs. The best infrastructure setups I’ve seen all share one trait: a clean, well-understood boundary between the platform layer and the workload layer, with the right tool on each side.

Draw the line. Keep it clean. Your on-call rotation will thank you.

To never miss an article subscribe to my newsletter
No ads. One click unsubscribe.