Deployment status lifecycle

Every deployment carries a single status field: the one value GET /deployments and ring deployment list surface, and the discriminant the outbound webhook deployment.status_changed event reports. This page is the canonical reference for what each status means, what moves a deployment between them, and which ones are final.

The status is computed by the reconciler: on every tick it observes the runtime, applies the desired state, and writes back the resulting status. A deployment.status_changed event is emitted whenever a tick lands the deployment on a different status than it had before.

The fourteen statuses

All values serialize as snake_case, identically on the wire (JSON), in the CLI output, and in the SQLite deployment.status column.

Lifecycle states

StatusMeaning
pendingCreated in the database, no container/VM started yet. Short-lived, since the next tick moves it to creating. Rarely observed.
creatingThe runtime is bringing instances up. Also the state a worker is held in by the readiness gate (see below) until its readiness checks are green.
runningUp and, when readiness checks are declared, actually ready (serving). Without readiness checks, running means simply "the container/VM is up". For a job, a transient state on the way to completed/failed.
completedJobs only. The one-shot task exited 0 (or, on Cloud Hypervisor, the guest shut down cleanly). Terminal.
deletedMarked for teardown (via DELETE /deployments/{id} / ring deployment delete). The reconciler removes every instance, then purges the row.

Failure states

StatusCauseTerminal?
failedA job exited non-zero / crashed; or a readiness check never turned green before the deadline on a non-rolling deployment; or (Cloud Hypervisor) firmware not found.Terminal
crash_loop_back_offA worker's container/VM kept dying and restart_count reached MAX_RESTART_COUNT (5). The reconciler stops trying.Terminal
insufficient_resourcesThe host doesn't have enough memory for the deployment's request. A retry can't conjure RAM, so Ring stops.Terminal
image_pull_back_offThe image couldn't be pulled (tag not found, registry auth, image_pull_policy: Never forbidding a pull, transient network).Retried
create_container_errorThe runtime rejected container creation (invalid mount, unsupported option, a port conflict the daemon surfaces at create time).Retried
network_errorCreating the namespace network/bridge failed.Retried
config_errorA mounted config (or a key within it) doesn't exist in the namespace.Retried
file_system_errorAn IO error handling volumes or temp config files.Retried
errorGeneric runtime fallback: a stats fetch, a JSON parse, a VM-start failure, or any error not classified above.Retried

Terminal vs retried. A terminal status is never reconciled again, because the deployment is done. A retried failure status stays in the reconcile loop: each tick re-attempts the apply, bumping restart_count, until either it succeeds (back to creatingrunning) or the counter hits MAX_RESTART_COUNT and a worker flips to crash_loop_back_off. The reconciler explicitly polls pending, creating, running, deleted, and the five retried error states; completed, failed, crash_loop_back_off, and insufficient_resources are left out on purpose.

Recover from a terminal or stuck state by re-applying a fixed manifest: ring apply resets restart_count and re-enters the lifecycle from the top. The restart counter is cumulative over the deployment's lifetime, not a sliding window.

Worker lifecycle

A kind: worker is a long-running service the reconciler keeps at exactly replicas instances.

pending → creating ──────────────→ running ──→ (stays running, reconciled each tick)
              │     (gate: ready)      │
              │                        ├──→ deleted              (you delete it)
              │←── readiness not green │
              │    (held here)         └──→ crash_loop_back_off  (restart_count ≥ 5)
              │
              └──→ image_pull_back_off / create_container_error / network_error /
                   config_error / file_system_error / error   (retried; → crash_loop_back_off
                   after MAX_RESTART_COUNT, or → running once resolved)

              insufficient_resources  (terminal — host out of memory)
  • creating → running happens as soon as the container/VM is up unless the deployment declares a readiness: true health check, in which case the readiness gate holds it in creating until ready.
  • running is stable. A liveness check failure doesn't move the status; it triggers the check's on_failure action (restart removes the instance and the reconciler recreates it; stop marks the deployment deleted; alert only emits an event). The status is not dragged back to creating once running is established.
  • A worker never reaches completed; that status is jobs-only.

Job lifecycle

A kind: job runs one instance to completion (replicas is ignored).

pending → creating → running ──→ completed   (exit 0 / clean guest shutdown)
                          │
                          ├──→ failed         (non-zero exit, OOM, signal, host-side timeout)
                          └──→ failed         (restart_count ≥ MAX_RESTART_COUNT)
  • On Cloud Hypervisor the host can't read the guest's exit code, so any clean VM shutdown is completed. Use a worker if you need precise exit-code semantics on CH.
  • Jobs are exempt from the readiness gate, so they go straight to completed/failed and never sit in a readiness-gated running.

The readiness gate

A worker that declares at least one readiness: true health check stays in creating until every readiness check has been success for its min_healthy_time (default 10s, anti-flap). Only then does it become running. This makes running mean the app is serving, not merely the process started, which is what makes the deployment.status_changed → running event trustworthy for an external subscriber waiting to know a deploy is done.

While in creating, only the readiness checks run (recorded for the gate to read); they do not fire on_failure actions, since a probe that isn't green yet during boot isn't a failure. Liveness checks start only once the deployment is running.

Deadline. A simple deployment (no rolling-update parent) whose readiness never turns green would otherwise sit in creating forever. Past RING_ROLLOUT_DEADLINE (default 600s, the same knob as the rolling-update drain, mirroring Kubernetes' progressDeadlineSeconds) Ring marks it failed with a readiness_deadline_exceeded event. A rolling-update child is exempt here: its deadline is the forced parent drain (the old version keeps serving), described in Reconciliation → rolling updates.

Without any readiness check, the legacy behaviour is preserved: running as soon as the container is up. See Health checks (design) → the readiness gate for the full mechanics.

Restart counter and crash_loop_back_off

Ring tracks a cumulative restart_count per deployment. It is bumped when:

  • A worker's container dies unexpectedly (Docker die/oom/kill events, or a CH VM going unresponsive), unless the shutdown was intentional (a delete/scale-down).
  • A retried error status re-attempts its apply and fails again.

Once restart_count reaches MAX_RESTART_COUNT (5), the next tick flips a worker to crash_loop_back_off (terminal) and a job to failed (terminal): the reconciler stops retrying, protecting the host from a tight crash loop. The counter is cumulative for the deployment's lifetime, not a sliding window; ring apply with a fixed manifest resets it.

Counters live in memory only, so restarting ring server clears them, so each (deployment, instance, check) triple starts back at zero after a server restart.

Observing the status

  • API: GET /deployments and GET /deployments/{id} return the status field; filter with GET /deployments?status=<value>. See API reference → Deployments.
  • CLI: ring deployment list shows a Status column; --status <value> (repeatable) filters. See CLI reference.
  • Events: ring deployment events <id> shows the per-transition history (state changes, health-check actions, error reasons like image_pull_back_off or readiness_deadline_exceeded).
  • Webhooks: subscribe to deployment.status_changed to be pushed every transition (old_statusnew_status) instead of polling. See Subscribe to events with webhooks.

See also