Architecture
kruntimes separates Kubernetes-level capacity management from request-level Run assignment.
Components
| Component | Responsibility |
|---|---|
| Run CRD | Durable record for one execution: runtime, input, assignment, phase, retry policy, timestamps, outputs, and artifact refs. |
| Runtime CRD | Defines a Runtime Pod pool, capacity, pod template, and artifact store configuration. |
| Runtime Controller | Reconciles Runtime CRs into Deployments, RBAC, NetworkPolicy, and runtime maintainer Deployments. |
| Scheduler | Watches Pending Runs and assigns them to healthy Runtime Pods with available capacity. |
| Runtimed | Sidecar in each Runtime Pod. Claims assigned Runs, calls the local Runtime Server, updates Run status, uploads artifacts, and emits structured logs. |
| Runtime Server | Local gRPC server that performs execution. Built-in implementations include Bash and Python. |
| Runtime Maintainer | Long-running Runtime-scope worker for maintenance that must outlive individual Runtime Pods, including artifact cleanup. |
| Stale Run Reaper | Detects assigned Runs whose Runtime Pod disappeared or stopped heartbeating and applies retry or terminal failure policy. |
| krt | CLI for creating, watching, cancelling, logging, and inspecting Runs. |
Control Flow
User / krt
│
▼
Run CRD: Pending
│
▼
Scheduler
│ selects healthy Runtime Pod with available capacity
▼
Run CRD: Scheduled + assignedPod
│
▼
runtimed sidecar in assigned Runtime Pod
│ claims Run and calls local Runtime Server
▼
Runtime Server
│ executes workload
▼
runtimed updates Run status, outputs, artifacts, and logs
Runtime Pod Model
Runtime Pod
├── runtimed sidecar
└── runtime container
└── Runtime Server gRPC endpoint
runtimed owns Kubernetes communication. Runtime Servers only implement the
local execution protocol.
Scheduling Model
Runtime Pods expose:
- Kubernetes
PodReady, kruntimes.io/RuntimedReadyheartbeat,- runtime labels,
- static capacity annotations.
The scheduler derives fast-changing usage from Run state, not from pod annotations. A Runtime Pod is a candidate only when it is ready, fresh, and below capacity.
When a Run stops consuming capacity, the scheduler wakes Pending Runs for the same namespace and runtime. A periodic retry remains as a fallback.
State Model
Run.status.phase uses these phases:
PendingScheduledRunningSucceededFailedTimeoutCancelled
Terminal conditions are normalized for failed, timeout, and cancelled outcomes.
Data Boundaries
Kubernetes stores compact control-plane state:
- lifecycle phase,
- assignment,
- bounded outputs,
- artifact references,
- timestamps,
- conditions.
Large data stays outside etcd:
- full stdout/stderr in structured logs,
- artifact files in the configured ArtifactStore,
- runtime-local execution state until
Forget.
Design Decisions
- No request-time Pod creation: warm Runtime Pods absorb short Runs.
- CRDs as source of truth: Kubernetes remains the durable control plane.
- Runtime Servers are local: no global runtime service mesh is required.
- At-least-once execution: runtimed and stale reaper share retry semantics.
- Trusted built-ins: built-in runtimes trade isolation for low latency.
See Security and Threat Model for isolation limits.