Troubleshooting
When something breaks, work outside-in: confirm Ring is responsive, then look at what Ring decided, then at what the application did. The full debugging flow is in how-to: observe and debug; this page collects the specific errors and their fixes.
Server won't start
"Failed to connect to Docker daemon"
The Docker daemon isn't running, or your user lacks permissions.
sudo systemctl start docker sudo usermod -aG docker $USER # then log out and back in docker ps # should now work
"Permission denied" on /var/run/docker.sock
Your user is not in the docker group.
sudo usermod -aG docker $USER # then log out and back in
"Port 3030 already in use"
Another process owns the port. Either stop it:
sudo ss -tlnp | grep 3030 # find the PID
Or change Ring's port in ~/.config/kemeter/ring/config.toml. The schema requires current, host, and api:
[contexts.default] current = true host = "127.0.0.1" api.scheme = "http" api.port = 3031 # was 3030
See reference: config.toml for every field.
"RING_SECRET_KEY is required" / exits with code 1
Ring refuses to start without RING_SECRET_KEY. Generate one and export it before starting:
export RING_SECRET_KEY="$(openssl rand -base64 32)" ring server start
Validate without starting the server:
ring doctor
For a managed service, the key belongs in a systemd EnvironmentFile=, not in your interactive shell. See how-to: run as a service.
Cloud Hypervisor VMs die with SIGSYS at boot
CH's default seccomp filter doesn't whitelist a syscall the boot path needs on some recent kernels. Symptom in the Ring log:
cloud-hypervisor process for ch-... exited with signal: 31 (SIGSYS) (core dumped) stderr: ==== Possible seccomp violation ====
Workaround:
[server.runtime.cloud_hypervisor] enabled = true seccomp = "false" # disable filter # or seccomp = "log" # keep filter, only log violations
Leave seccomp unset in production unless you've actually hit this.
Authentication fails
"Invalid credentials" right after ring init
The default credentials are admin / changeme, but only on the first server start before the password is changed. If you've changed the password and forgotten it, the only path forward is to delete the user from the database and recreate it via ring user create, since Ring has no password-reset workflow.
"Unauthorized" on every command
The token in ~/.config/kemeter/ring/auth.json is invalid. Log in again:
ring login --username admin --password "your-password"
"Could not connect to localhost"
Check that the server is listening on the address the CLI thinks it should:
curl http://localhost:3030/healthz # should return {"state":"UP"}
Ring binds to its detected local IP by default (e.g. 192.168.1.x), not localhost. The CLI uses whatever host value is in your config.toml. If they disagree, either change host to 127.0.0.1 for loopback-only, or point the CLI at the actual bind IP.
Deployment problems
For what each deployment status means and how a deployment moves between them, see Deployment status lifecycle.
Stuck in creating
Look at the events first:
ring deployment events <DEPLOYMENT_ID> --level error
Most common causes:
SecretResolutionError: asecretRefinenvironment:refers to a secret that doesn't exist in the deployment's namespace. Create the secret, or fix the manifestImagePullBackOff: Docker can't pull the image. Wrong tag, missing credentials, network problem. Verify withdocker pull <image>from the hostInstanceCreationFailed: Docker rejected the container creation. Common subreasons: port conflict (bind: address already in use), invalid bind mount (source path missing), unsupported runtime option
Stuck in deleted
A deployment shown as deleted should disappear within a scheduler cycle or two once its containers are gone, because the row is purged automatically. If it lingers and ring deployment events <ID> keeps logging secret_resolution_error / config_load_error / volume_resolution_error, you are on a build prior to the fix where the scheduler tried to resolve a deployment's secrets/configs/volumes before tearing it down. A resource deleted alongside the deployment (e.g. the secret a secretRef pointed at) then made resolution fail every tick, so cleanup was never reached.
Current builds reconcile a deleted deployment straight to teardown and purge, ignoring secret/config/volume resolution. If you hit this on an older server, upgrade; the stuck row clears on the next cycle after restart.
image_pull_back_off
ring deployment events <DEPLOYMENT_ID> --level error --limit 20
The event message names the likely cause and the fix, and keeps Docker's exact rejection in (original error: …) for the full detail. Three cases are distinguished:
… not found …: the tag or digest doesn't exist in the registry (orimage_pull_policy: Neverand the image isn't cached locally). Check the image reference.registry authentication failed … — check config.server, config.username and config.password: the registry refused the credentials (or required some and none were sent). Fix the credentials below.cannot reach the registry … — is it up and the registry host correct?: a transport failure (connection refused, host not found, timeout). The registry host is wrong or down; verify withdocker pull <image>from the host.
For private registries, set config.server / config.username / config.password in the manifest:
config: server: "registry.company.com" username: "registry-user" password: "$REGISTRY_PASSWORD" image_pull_policy: "Always"
See how-to: deploy with secrets → private registry credentials.
crash_loop_back_off
The container has crashed more than MAX_RESTART_COUNT times. Look at:
ring deployment events <DEPLOYMENT_ID> --level warning # crashes show up here ring deployment logs <DEPLOYMENT_ID> --tail 200 # what the app said
After fixing the root cause, re-apply the manifest. Ring resets the counter on a fresh apply.
insufficient_resources
The host doesn't have enough free memory to honour the deployment's requested memory, so Ring refused to start it, before creating any container or booting any VM. The event names the gap:
ring deployment events <DEPLOYMENT_ID> --level error --limit 5 # insufficient host memory for 'web': needs 4096 MiB but only 1800 MiB is available — …
This status is terminal: Ring does not retry, because the memory isn't going to reappear on its own. Two ways out:
- Free memory on the host (stop other workloads), then re-apply the manifest.
- Lower the deployment's
resources.requests.memory(orresources.limits.memoryif no request is set) to fit, then re-apply.
The check compares against memory available at that moment; it's a guard against gross over-asks, not a precise reservation system. CPU is not gated; CPU overcommit is allowed.
Health checks flap
ring deployment health-checks <DEPLOYMENT_ID> # full history ring deployment health-checks <DEPLOYMENT_ID> --latest # one row per check
Common causes:
timeout: the probe'stimeoutis shorter than the application's actual response time. Increase, or fix the slow pathfailedwith HTTP: a 3xx response counts as failure (Ring doesn't follow redirects). Point the URL at the redirect targetfailedwith TCP: the port isn't open inside the container yet. Either bumpthreshold, or use a TCP/HTTP check on an endpoint that signals real readinessfailedflapping between hosts:intervalis currently advisory; probes actually run once per scheduler tick (default 10s). Lower the tick (RING_SCHEDULER_INTERVAL=2) for tighter cadence
Rolling update stuck on the new version
ring deployment list --status failed ring deployment events <CHILD_ID> --level error ring deployment health-checks <CHILD_ID> ring deployment logs <CHILD_ID> --tail 200
If the child never goes healthy, Ring leaves the parent running. Fix the manifest and re-apply (creates a new child, ignores the failed one), or ring deployment delete <CHILD_ID> to clear it explicitly.
To roll back, set the image tag to the previous version and re-apply; Ring rolls forward to the older tag through the same rolling-update path.
"Multiple active deployments share name+namespace"
A previous failed rollout left the parent and a stuck child both running. List the duplicates:
ring deployment list -n <NAMESPACE>
Delete the unwanted one with ring deployment delete <ID>. Ring falls back to immediate replacement until the duplicates are gone.
Cloud Hypervisor specifics
"Failed to create TAP"
Cloud Hypervisor needs CAP_NET_ADMIN:
sudo setcap cap_net_admin,cap_net_raw+ep $(which cloud-hypervisor) getcap $(which cloud-hypervisor)
Re-run after every CH upgrade, since setcap doesn't survive a new binary.
VM boots but is unreachable on its published port
socat isn't installed on the host. Each ports: entry needs a socat forwarder; without it the VM boots fine but no host port is bound.
sudo apt install socat # or dnf install socat ring doctor # confirms socat presence
environment: is empty inside the VM
Either xorriso isn't installed on the host, or the guest image doesn't ship cloud-init.
sudo apt install xorriso ring doctor
Custom guest images from scratch (e.g. Buildroot) won't pick up env vars unless you add cloud-init or read /etc/ring/env yourself in your boot scripts.
command: health check rejected
Either the in-guest ring-agent isn't running, or the agent's cap_net_admin / vsock module isn't enabled. See Cloud Hypervisor → prerequisites.
Generic diagnostic flow
ring doctor is the first-line check: it verifies Docker connectivity, the encryption key, and Cloud Hypervisor prerequisites (binary, KVM, firmware, virtiofsd, xorriso, socat).
ring doctor
After that, the full debugging order from how-to: observe and debug:
curl http://localhost:3030/healthz # is Ring up? ring deployment list # what's the state? ring deployment events <ID> --level error --limit 50 # what did Ring decide? ring deployment health-checks <ID> --latest # are probes passing? ring deployment logs <ID> --tail 200 # what did the app say? ring deployment metrics <ID> # resource pressure?
Then, if you've ruled out Ring:
docker ps --filter "label=ring_deployment=$DEPLOYMENT_ID" docker logs <CONTAINER_ID> docker inspect <CONTAINER_ID>
Server logs
sudo journalctl -u ring -f # systemd RUST_LOG=info ring server start # foreground RUST_LOG=ring=debug ring server start # all Ring components RUST_LOG=ring::scheduler=debug ring server start # one component