Auto-healing
A cron daemon on the relay server monitors every service and recovers failures automatically.
The auto-healing daemon runs continuously on the relay server alongside the admin panel. When a server fails its health check, the daemon re-runs the full recovery sequence and confirms the server is back online — with no manual intervention required.
Zero config
uses the same healthcheck and run hooks already defined in config.yaml
Full recovery
preflight → start → postflight cycle on every failing server
services:
web:
run:
preflight:
- bun install
start:
- pm2 start index.js --name web
postflight:
- pm2 list
healthcheck:
cmd: curl http://localhost:3000/api/healthcheck
test: Server RunningHow it works
What happens when a server fails its health check
The daemon polls each service on a recurring schedule. A single failed check triggers the full three-stage recovery sequence.
01
Health check runs
The daemon runs the configured healthcheck.cmd against every server in every service. If the output contains healthcheck.test, the server is considered healthy.
02
Recovery sequence fires
If the check fails, the daemon re-runs the full run.preflight → run.start → run.postflight sequence on that server — the same commands that ran during the original deployment.
03
Confirmation check
After the recovery sequence completes, the health check runs again to confirm the server is back online.
Recovery stages
The same three-stage sequence from your config
No additional configuration is required. The daemon reuses the run and healthcheck blocks already defined for each service.
Stage 1
Preflight
Re-runs preflight commands from config: rebuilds, reinstalls, migrations, or anything that must complete before the process starts.
Stage 2
Start
Re-runs start commands to bring the application process back up on the failed server.
Stage 3
Postflight
Re-runs postflight commands for verification, reloads, or any follow-up checks after the process is started.
Configuration
Nothing extra to configure
Auto-healing is enabled automatically on every deployment. The daemon reads the same config.yaml used at deploy time.
services:
api:
instances: 2
clusters: 2
run:
preflight:
- bun install --production
start:
- pm2 start dist/server.js --name api
postflight:
- pm2 list
healthcheck:
cmd: curl -s http://localhost:4000/health
test: okAny server that crashes, runs out of memory, or encounters a transient error will be brought back to a healthy state automatically. The healthcheck, run.preflight, run.start, and run.postflight fields drive both the initial deployment and the recovery process.
Next step
Take the same setup across providers
Once the deployment shape is clear, duplicate it to another cloud provider with a single config override.
Continue to multi-cloud setup