Operations Guide
This guide covers day-two operations for the current cluster-wide kruntimes installation model.
kruntimes is still v0.x experimental with v1alpha1 APIs. Back up manifests
and values before upgrades, and read the release notes for breaking changes.
Installation
Install the platform chart once per cluster. It installs CRDs, scheduler, controller, cluster-scoped RBAC, metrics Services, and optional monitoring resources.
helm upgrade --install kruntimes ./charts/kruntimes \
--namespace kruntimes-system \
--create-namespace \
--set scheduler.image=<scheduler-image> \
--set controller.image=<controller-image> \
--set runtimed.image=<runtimed-image>
Install built-in Runtime CRs into each namespace that should host Runtime Pods and Runs:
helm upgrade --install kruntimes-runtimes ./charts/kruntimes-runtimes \
--namespace default \
--create-namespace \
--set bash.image=<bash-runtime-image> \
--set python.image=<python-runtime-image>
Use explicit image tags or digests for shared clusters. Do not depend on mutable local image names outside development.
Upgrade
Before upgrading:
- Read the release notes and
CHANGELOG.md. - Check
docs/compatibility.mdfor Kubernetes, Helm, Go, Python, and CLI compatibility changes. - Back up Helm values and kruntimes custom resources.
- Run the release preflight checks if building from source.
Upgrade the platform first so CRDs and controllers are ready for any Runtime schema or behavior changes:
helm upgrade kruntimes ./charts/kruntimes \
--namespace kruntimes-system \
--reuse-values
Then upgrade Runtime definitions in each workload namespace:
helm upgrade kruntimes-runtimes ./charts/kruntimes-runtimes \
--namespace default \
--reuse-values
After upgrading, verify the control plane and Runtime Pods:
kubectl get deploy -n kruntimes-system
kubectl get runtime,runs -A
kubectl get pods -A -l runtime
Uninstall
Remove built-in Runtime releases from workload namespaces before removing the platform:
helm uninstall kruntimes-runtimes --namespace default --ignore-not-found
helm uninstall kruntimes --namespace kruntimes-system --ignore-not-found
The platform uninstall does not delete CRDs by default. This is intentional:
deleting CRDs deletes all Runtime, Run, and Workflow objects cluster-wide.
Only remove CRDs after backing up or intentionally discarding those objects:
kubectl delete crd runs.kruntimes.io runtimes.kruntimes.io workflows.kruntimes.io
External artifact stores, object buckets, PVC contents, and external logging systems are not deleted by Helm uninstall.
Backup
At minimum, back up:
- Helm release values for the platform and Runtime charts.
Runtime,Run, andWorkflowcustom resources from namespaces you intend to restore.- Secrets referenced by Runtime artifact stores or custom Runtime Pod templates.
- PVCs or object storage buckets used by artifact stores.
Example Kubernetes object backup:
kubectl get runtime,runs,workflows -A -o yaml > kruntimes-objects.yaml
helm get values kruntimes -n kruntimes-system --all > kruntimes-values.yaml
helm get values kruntimes-runtimes -n default --all > kruntimes-runtimes-values.yaml
Artifact and log backups are backend-specific. For filesystem artifact stores, back up the referenced PVCs. For S3-compatible stores, back up the bucket or prefix according to the provider’s procedures. kruntimes does not manage log collection storage.
Restore
Restore in this order:
- Restore external dependencies such as Secrets, PVCs, object buckets, and logging backends.
- Install or upgrade the platform chart.
- Restore
Runtimeobjects and Runtime chart releases. - Restore
RunandWorkflowobjects only when their referenced Runtime and artifact store configuration are available.
Artifact cleanup uses the persisted Run.status.artifactStore snapshot. For
old Runs that do not have this snapshot, the controller may need the original
Runtime artifact store configuration to continue cleanup.
Troubleshooting
Run stays Pending
Check that a Runtime with the requested name exists in the same namespace and that its Pods are ready:
kubectl get runtime,pods -n <namespace>
kubectl describe run <run> -n <namespace>
The scheduler only assigns to Runtime Pods that are Kubernetes Ready,
kruntimes.io/RuntimedReady, and below configured capacity.
Run is Scheduled but not Running
Check the assigned Pod and runtimed logs:
kubectl get run <run> -n <namespace> -o yaml
kubectl logs <runtime-pod> -n <namespace> -c runtimed
If the Runtime Pod disappeared, retry behavior depends on Run.spec.retry.
Runtime Pods are not Ready
Inspect the Runtime, generated Deployment, and Pod events:
kubectl describe runtime <runtime> -n <namespace>
kubectl describe deploy runtime-<runtime> -n <namespace>
kubectl describe pod <runtime-pod> -n <namespace>
Common causes include image pull failures, insufficient resources, missing Secrets, invalid Pod template fields, or a Runtime Server that does not answer health checks.
Artifact cleanup is stuck
Check the Run finalizers, stored artifact configuration, and runtime maintainer Deployment:
kubectl get run <run> -n <namespace> -o yaml
kubectl get deploy,pods -n <namespace> -l app.kubernetes.io/component=runtime-maintainer
kubectl logs deploy/<runtime-maintainer-deployment> -n <namespace>
Missing artifact credentials, deleted PVCs, deleted buckets, or unavailable S3 endpoints can prevent finalizer cleanup until the dependency is restored.
krt cannot read logs or artifacts
krt logs and artifact downloads use Kubernetes port-forwarding to a Runtime
Pod. The user needs permission to get Runs and Pods and to create
pods/portforward in the namespace.
kubectl auth can-i get runs -n <namespace>
kubectl auth can-i get pods -n <namespace>
kubectl auth can-i create pods/portforward -n <namespace>