Skip to content

kro: Kubernetes Operators Without the Code

kro (Kube Resource Orchestrator) is a CNCF project from Kubernetes SIG Cloud Provider that landed at version 0.9 this year. The pitch is simple: define a ResourceGraphDefinition (RGD) in YAML, and kro generates a CRD plus a live controller for it — no Go, no kubebuilder, no code-gen. You get a real reconciliation loop watching real Kubernetes resources, derived from a YAML template and CEL expressions.

It deserves a closer look: where does it sit relative to Helm and proper operators, and — perhaps most interestingly — can you use it to build toy operators to learn the operator pattern without writing code?

How kro works

kro is itself an operator. It ships as a single controller Deployment that watches ResourceGraphDefinition objects. When you apply an RGD, kro's controller reads it, registers a new CRD in the cluster, and starts an instance controller for that CRD — all at runtime, without a restart, without a build. You are not generating a static CRD YAML and deploying a hand-written controller; kro is dynamically doing both on your behalf. The RGD is the source of truth; kro is the operator that turns it into a running API.

A ResourceGraphDefinition has two parts:

  • spec.schema — defines the CRD users interact with: field types, defaults, constraints, and what to surface in .status
  • spec.resources — the Kubernetes resources to create per instance, wired together with ${cel.expression} references

Consider a Redis RGD that creates a ConfigMap (for runtime params), a StatefulSet, a headless Service for peer discovery, and a regular Service for client access:

yaml
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: redis
spec:
  schema:
    apiVersion: v1alpha1
    kind: Redis
    spec:
      replicas: integer | default=1
      maxMemory: string | default="256mb"
      maxMemoryPolicy: string | default="allkeys-lru"
    status:
      ready: ${statefulset.status.readyReplicas >= 1}
      readyReplicas: ${statefulset.status.readyReplicas}
      endpoint: ${schema.metadata.name + "." + schema.metadata.namespace + ".svc.cluster.local:6379"}

  resources:
    - id: config # no upstream CEL refs → first in DAG
      template:
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: ${schema.metadata.name}-config
        data:
          MAXMEMORY: ${schema.spec.maxMemory}
          MAXMEMORY_POLICY: ${schema.spec.maxMemoryPolicy}

    - id: statefulset # refs config.metadata.name → depends on config
      readyWhen:
        - ${statefulset.status.readyReplicas >= 1}
      template:
        apiVersion: apps/v1
        kind: StatefulSet
        metadata:
          name: ${schema.metadata.name}
        spec:
          serviceName: ${schema.metadata.name + "-headless"}
          replicas: ${schema.spec.replicas}
          selector:
            matchLabels:
              app: ${schema.metadata.name}
          template:
            metadata:
              labels:
                app: ${schema.metadata.name}
            spec:
              containers:
                - name: redis
                  image: redis:7-alpine
                  command:
                    [
                      "redis-server",
                      "--maxmemory",
                      "$(MAXMEMORY)",
                      "--maxmemory-policy",
                      "$(MAXMEMORY_POLICY)",
                    ]
                  ports:
                    - containerPort: 6379
                  envFrom:
                    - configMapRef:
                        name: ${config.metadata.name}

    - id: headless # refs statefulset selector → depends on statefulset
      template:
        apiVersion: v1
        kind: Service
        metadata:
          name: ${schema.metadata.name + "-headless"}
        spec:
          clusterIP: None
          selector: ${statefulset.spec.selector.matchLabels}
          ports:
            - port: 6379

    - id: client # refs statefulset selector → depends on statefulset
      template:
        apiVersion: v1
        kind: Service
        metadata:
          name: ${schema.metadata.name}
        spec:
          selector: ${statefulset.spec.selector.matchLabels}
          ports:
            - port: 6379

kro never sees an explicit ordering declaration. It parses every ${...} expression, identifies which resource ID each one references, and builds the dependency graph from that. config has no upstream references — it goes first. statefulset references config.metadata.name — it goes second, and kro waits for the readyWhen gate (readyReplicas >= 1) before proceeding. headless and client both reference statefulset.spec.selector.matchLabels — they depend on statefulset but not each other, so kro creates them in parallel.

A user creates a Redis instance with:

yaml
apiVersion: kro.run/v1alpha1
kind: Redis
metadata:
  name: session-cache
  namespace: default
spec:
  replicas: 1
  maxMemory: "512mb"
  maxMemoryPolicy: "volatile-lru"

kro creates four real Kubernetes resources in topological order and writes the client endpoint and readyReplicas count back to Redis.status. Patch maxMemory on the instance and kro reconciles the ConfigMap and restarts the pod. Delete session-cache-headless manually and kro recreates it within one reconciliation cycle.

kro vs. Helm

Helm and kro both sit in the "abstraction over a set of K8s resources" space, but their execution model is fundamentally different.

Helmkro
ModelImperative install/upgrade/rollbackDeclarative continuous reconciliation
Render timeAt helm install / helm upgradeContinuously, on every change
Self-healingNo — drift is not detectedYes — reconciliation loop corrects drift
OrderingBasic hooks, no readiness-aware DAGFull dependency graph with readiness gates
StatusNo native aggregationStatus values surfaced from child resources
Schema validationvalues.yaml is untyped; JSON Schema possibleTyped schema with defaults and constraints
TemplatingGo templates + SprigCEL expressions
DistributionChart registries (OCI / Helm Hub)RGDs live as CRDs in the cluster
LifecycleUser-triggered; release history in SecretsController-driven; no release history

The biggest difference: Helm renders once and walks away. If someone deletes a Deployment that Helm created, it stays deleted until the next helm upgrade. kro detects drift and reconciles it back. That is the difference between a package manager and an operator.

Helm charts are also easier to share — push to an OCI registry, anyone can install. kro RGDs are cluster-scoped objects; distributing them means distributing YAML files, not a proper release artifact. On the other hand, Helm's Go-template syntax becomes genuinely painful at scale. CEL is typed, terminates by definition, and can be statically analysed before anything runs.

kro vs. Kubernetes operators

A proper Kubernetes operator (kubebuilder, controller-runtime, Kopf, etc.) is a controller written in code. kro automates what an operator does for a specific class of problems — "create and manage a fixed set of K8s resources per instance" — and makes it declarative.

Custom operatorkro
ImplementationGo / Python / any languageYAML + CEL
Reconciliation loopYou write itkro runs it
LogicArbitrary: state machines, external API calls, mutationsCEL only — no side effects, non-Turing-complete
External callsYes — call any API, write to DBs, etc.No — CEL has no I/O
Complex state machinesYesNo
WatchesConfigurable — any resource typeChild resources only
Webhook supportYes (admission, conversion)Not directly
Build + deployBuild binary, containerize, deploy controller podApply an RGD YAML
CRD versioningFull multi-version supportSingle version (multi-version planned)

The key constraint: CEL expressions in kro have no side effects and always terminate. That is a feature for auditability — you can prove what a definition does — but it rules out anything that goes beyond wiring Kubernetes resources together. If your operator needs to call the AWS SDK directly, update a database, or implement a non-trivial state machine, you need real code.

For the large class of operators that mostly create, update, and delete standard Kubernetes (or CRD-backed) resources in response to a custom resource, kro covers the use case entirely.

Can kro build toy operators?

Yes, and it is surprisingly good at it.

The standard advice for learning the operator pattern is "use kubebuilder and implement a simple controller." That workflow requires Go, a proper dev environment, kubebuilder scaffolding, controller-gen for CRD generation, building a container, pushing it to a registry, and deploying a controller Deployment. Getting through all that before you can reconcile your first resource takes a day.

With kro, a toy operator is:

bash
kubectl apply -f my-rgd.yaml      # register CRD + start controller
kubectl apply -f my-instance.yaml # create an instance
kubectl get myresource             # observe reconciliation

That is the operator pattern — CRD, controller, reconciliation loop — in three commands and a YAML file.

One feature not shown in the Redis RGD is conditional resources via includeWhen. A resource tagged with includeWhen is only created when the CEL expression evaluates to true — and if it evaluates to false, every node that depends on it is also dropped from the graph. A MicroService RGD that optionally creates an Ingress:

yaml
    - id: ingress
      includeWhen:
        - ${schema.spec.enableIngress}
      template:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        ...

With enableIngress: false, the Ingress is never created, never tracked, never reconciled — it does not exist in kro's graph for that instance. Flip it to true and kro creates it on the next reconciliation cycle.

What the Redis and MicroService examples together teach, without a line of Go:

  • Dependency ordering inferred from CEL expression analysis — no explicit dependsOn
  • Readiness gating — upstream resources must satisfy readyWhen before dependents are created
  • Parallel creation for nodes with no mutual dependency (HPA and Service above)
  • Conditional subgraphs — entire branches included or excluded at runtime
  • Status aggregation — values from child resources projected back to the parent CR
  • Self-healing — drift in any managed resource triggers reconciliation

Where you hit the wall:

  • External API calls (CEL has no I/O — no AWS SDK, no HTTP requests)
  • Non-trivial state machines (kro has one reconciliation loop; no branching on historical state)
  • Watching resources outside the managed set
  • Admission or conversion webhooks

The ceiling is real, but it covers most of what a first or second operator needs to do.

Beyond basics: collections and RGD chaining

Collections (forEach) expand one resource template into N resources from a list or range. A Redis RGD with shards: 3 that expands into three StatefulSet instances, each with a different index, is a single resource definition with forEach: ${lists.range(schema.spec.shards)}. No loops in code, no Helm range template — the expansion is part of the graph.

RGD chaining lets one instance reference outputs from another. A Database RGD exposes status.endpoint; an Application RGD consumes it via an external reference. This is GitOps-friendly composition: each RGD is a unit of abstraction, and you compose them by referencing their status fields. The dependency management across instances is still graph-based — kro waits for the upstream instance to be ready before reconciling the downstream one.

Where kro fits in the ecosystem

Helm gives you templated installs with no continuous reconciliation. kubebuilder / operator-sdk gives you a full operator with arbitrary logic. kro sits between: real continuous reconciliation and a real CRD, constrained to wiring Kubernetes resources via CEL.

The sweet spot is platform engineering — the Application CR that wraps things like Deployment + HorizontalPodAutoscaler + PodDisruptionBudget + ServiceAccount + NetworkPolicy behind five fields. That is the use case kro was designed for, and it covers it completely without requiring a single line of controller code.

For simple operators, kro removes every barrier that usually stops people from experimenting with the operator pattern. No Go toolchain, no container registry, no deployment manifests for the controller itself — just an RGD YAML and a running kro installation.