Deployment status lifecycle
Every deployment carries a single status field: the one value GET /deployments and ring deployment list surface, and the discriminant the outbound webhook deployment.status_changed event reports. This page is the canonical reference for what each status means, what moves a deployment between them, and which ones are final.
The status is computed by the reconciler: on every tick it observes the runtime, applies the desired state, and writes back the resulting status. A deployment.status_changed event is emitted whenever a tick lands the deployment on a different status than it had before.
The fourteen statuses
All values serialize as snake_case, identically on the wire (JSON), in the CLI output, and in the SQLite deployment.status column.
Lifecycle states
| Status | Meaning |
|---|---|
pending | Created in the database, no container/VM started yet. Short-lived, since the next tick moves it to creating. Rarely observed. |
creating | The runtime is bringing instances up. Also the state a worker is held in by the readiness gate (see below) until its readiness checks are green. |
running | Up and, when readiness checks are declared, actually ready (serving). Without readiness checks, running means simply "the container/VM is up". For a job, a transient state on the way to completed/failed. |
completed | Jobs only. The one-shot task exited 0 (or, on Cloud Hypervisor, the guest shut down cleanly). Terminal. |
deleted | Marked for teardown (via DELETE /deployments/{id} / ring deployment delete). The reconciler removes every instance, then purges the row. |
Failure states
| Status | Cause | Terminal? |
|---|---|---|
failed | A job exited non-zero / crashed; or a readiness check never turned green before the deadline on a non-rolling deployment; or (Cloud Hypervisor) firmware not found. | Terminal |
crash_loop_back_off | A worker's container/VM kept dying and restart_count reached MAX_RESTART_COUNT (5). The reconciler stops trying. | Terminal |
insufficient_resources | The host doesn't have enough memory for the deployment's request. A retry can't conjure RAM, so Ring stops. | Terminal |
image_pull_back_off | The image couldn't be pulled (tag not found, registry auth, image_pull_policy: Never forbidding a pull, transient network). | Retried |
create_container_error | The runtime rejected container creation (invalid mount, unsupported option, a port conflict the daemon surfaces at create time). | Retried |
network_error | Creating the namespace network/bridge failed. | Retried |
config_error | A mounted config (or a key within it) doesn't exist in the namespace. | Retried |
file_system_error | An IO error handling volumes or temp config files. | Retried |
error | Generic runtime fallback: a stats fetch, a JSON parse, a VM-start failure, or any error not classified above. | Retried |
Terminal vs retried. A terminal status is never reconciled again, because the deployment is done. A retried failure status stays in the reconcile loop: each tick re-attempts the apply, bumping restart_count, until either it succeeds (back to creating → running) or the counter hits MAX_RESTART_COUNT and a worker flips to crash_loop_back_off. The reconciler explicitly polls pending, creating, running, deleted, and the five retried error states; completed, failed, crash_loop_back_off, and insufficient_resources are left out on purpose.
Recover from a terminal or stuck state by re-applying a fixed manifest:
ring applyresetsrestart_countand re-enters the lifecycle from the top. The restart counter is cumulative over the deployment's lifetime, not a sliding window.
Worker lifecycle
A kind: worker is a long-running service the reconciler keeps at exactly replicas instances.
pending → creating ──────────────→ running ──→ (stays running, reconciled each tick) │ (gate: ready) │ │ ├──→ deleted (you delete it) │←── readiness not green │ │ (held here) └──→ crash_loop_back_off (restart_count ≥ 5) │ └──→ image_pull_back_off / create_container_error / network_error / config_error / file_system_error / error (retried; → crash_loop_back_off after MAX_RESTART_COUNT, or → running once resolved) insufficient_resources (terminal — host out of memory)
creating → runninghappens as soon as the container/VM is up unless the deployment declares areadiness: truehealth check, in which case the readiness gate holds it increatinguntil ready.runningis stable. A liveness check failure doesn't move the status; it triggers the check'son_failureaction (restartremoves the instance and the reconciler recreates it;stopmarks the deploymentdeleted;alertonly emits an event). The status is not dragged back tocreatingoncerunningis established.- A worker never reaches
completed; that status is jobs-only.
Job lifecycle
A kind: job runs one instance to completion (replicas is ignored).
pending → creating → running ──→ completed (exit 0 / clean guest shutdown) │ ├──→ failed (non-zero exit, OOM, signal, host-side timeout) └──→ failed (restart_count ≥ MAX_RESTART_COUNT)
- On Cloud Hypervisor the host can't read the guest's exit code, so any clean VM shutdown is
completed. Use a worker if you need precise exit-code semantics on CH. - Jobs are exempt from the readiness gate, so they go straight to
completed/failedand never sit in a readiness-gatedrunning.
The readiness gate
A worker that declares at least one readiness: true health check stays in creating until every readiness check has been success for its min_healthy_time (default 10s, anti-flap). Only then does it become running. This makes running mean the app is serving, not merely the process started, which is what makes the deployment.status_changed → running event trustworthy for an external subscriber waiting to know a deploy is done.
While in creating, only the readiness checks run (recorded for the gate to read); they do not fire on_failure actions, since a probe that isn't green yet during boot isn't a failure. Liveness checks start only once the deployment is running.
Deadline. A simple deployment (no rolling-update parent) whose readiness never turns green would otherwise sit in creating forever. Past RING_ROLLOUT_DEADLINE (default 600s, the same knob as the rolling-update drain, mirroring Kubernetes' progressDeadlineSeconds) Ring marks it failed with a readiness_deadline_exceeded event. A rolling-update child is exempt here: its deadline is the forced parent drain (the old version keeps serving), described in Reconciliation → rolling updates.
Without any readiness check, the legacy behaviour is preserved: running as soon as the container is up. See Health checks (design) → the readiness gate for the full mechanics.
Restart counter and crash_loop_back_off
Ring tracks a cumulative restart_count per deployment. It is bumped when:
- A worker's container dies unexpectedly (Docker
die/oom/killevents, or a CH VM going unresponsive), unless the shutdown was intentional (a delete/scale-down). - A retried error status re-attempts its apply and fails again.
Once restart_count reaches MAX_RESTART_COUNT (5), the next tick flips a worker to crash_loop_back_off (terminal) and a job to failed (terminal): the reconciler stops retrying, protecting the host from a tight crash loop. The counter is cumulative for the deployment's lifetime, not a sliding window; ring apply with a fixed manifest resets it.
Counters live in memory only, so restarting ring server clears them, so each (deployment, instance, check) triple starts back at zero after a server restart.
Observing the status
- API:
GET /deploymentsandGET /deployments/{id}return thestatusfield; filter withGET /deployments?status=<value>. See API reference → Deployments. - CLI:
ring deployment listshows aStatuscolumn;--status <value>(repeatable) filters. See CLI reference. - Events:
ring deployment events <id>shows the per-transition history (state changes, health-check actions, error reasons likeimage_pull_back_offorreadiness_deadline_exceeded). - Webhooks: subscribe to
deployment.status_changedto be pushed every transition (old_status→new_status) instead of polling. See Subscribe to events with webhooks.
See also
- Reconciliation: the loop that computes these statuses
- Health checks (design): readiness vs liveness, the gate
- Troubleshooting: what to do when a deployment is stuck in
creating,deleted,image_pull_back_off,crash_loop_back_off, orinsufficient_resources - Subscribe to events with webhooks: push status changes to an endpoint