Architecture

kruntimes separates Kubernetes-level capacity management from request-level Run assignment.

Components

ComponentResponsibility
Run CRDDurable record for one execution: runtime, input, assignment, phase, retry policy, timestamps, outputs, and artifact refs.
Runtime CRDDefines a Runtime Pod pool, capacity, pod template, and artifact store configuration.
Runtime ControllerReconciles Runtime CRs into Deployments, RBAC, NetworkPolicy, and runtime maintainer Deployments.
SchedulerWatches Pending Runs and assigns them to healthy Runtime Pods with available capacity.
RuntimedSidecar in each Runtime Pod. Claims assigned Runs, calls the local Runtime Server, updates Run status, uploads artifacts, and emits structured logs.
Runtime ServerLocal gRPC server that performs execution. Built-in implementations include Bash and Python.
Runtime MaintainerLong-running Runtime-scope worker for maintenance that must outlive individual Runtime Pods, including artifact cleanup.
Stale Run ReaperDetects assigned Runs whose Runtime Pod disappeared or stopped heartbeating and applies retry or terminal failure policy.
krtCLI for creating, watching, cancelling, logging, and inspecting Runs.

Control Flow

User / krt
Run CRD: Pending
Scheduler
  │ selects healthy Runtime Pod with available capacity
Run CRD: Scheduled + assignedPod
runtimed sidecar in assigned Runtime Pod
  │ claims Run and calls local Runtime Server
Runtime Server
  │ executes workload
runtimed updates Run status, outputs, artifacts, and logs

Runtime Pod Model

Runtime Pod
├── runtimed sidecar
└── runtime container
    └── Runtime Server gRPC endpoint

runtimed owns Kubernetes communication. Runtime Servers only implement the local execution protocol.

Scheduling Model

Runtime Pods expose:

  • Kubernetes PodReady,
  • kruntimes.io/RuntimedReady heartbeat,
  • runtime labels,
  • static capacity annotations.

The scheduler derives fast-changing usage from Run state, not from pod annotations. A Runtime Pod is a candidate only when it is ready, fresh, and below capacity.

When a Run stops consuming capacity, the scheduler wakes Pending Runs for the same namespace and runtime. A periodic retry remains as a fallback.

State Model

Run.status.phase uses these phases:

  • Pending
  • Scheduled
  • Running
  • Succeeded
  • Failed
  • Timeout
  • Cancelled

Terminal conditions are normalized for failed, timeout, and cancelled outcomes.

Data Boundaries

Kubernetes stores compact control-plane state:

  • lifecycle phase,
  • assignment,
  • bounded outputs,
  • artifact references,
  • timestamps,
  • conditions.

Large data stays outside etcd:

  • full stdout/stderr in structured logs,
  • artifact files in the configured ArtifactStore,
  • runtime-local execution state until Forget.

Design Decisions

  • No request-time Pod creation: warm Runtime Pods absorb short Runs.
  • CRDs as source of truth: Kubernetes remains the durable control plane.
  • Runtime Servers are local: no global runtime service mesh is required.
  • At-least-once execution: runtimed and stale reaper share retry semantics.
  • Trusted built-ins: built-in runtimes trade isolation for low latency.

See Security and Threat Model for isolation limits.