Home/Docs/Auto-Healing

Auto-healing

A cron daemon on the relay server monitors every service and recovers failures automatically.

The auto-healing daemon runs continuously on the relay server alongside the admin panel. When a server fails its health check, the daemon re-runs the full recovery sequence and confirms the server is back online — with no manual intervention required.


Zero config

uses the same healthcheck and run hooks already defined in config.yaml

Full recovery

preflight → start → postflight cycle on every failing server


healthcheck-config
yaml
services: web: run: preflight: - bun install start: - pm2 start index.js --name web postflight: - pm2 list healthcheck: cmd: curl http://localhost:3000/api/healthcheck test: Server Running

How it works

What happens when a server fails its health check

The daemon polls each service on a recurring schedule. A single failed check triggers the full three-stage recovery sequence.


01

Health check runs

The daemon runs the configured healthcheck.cmd against every server in every service. If the output contains healthcheck.test, the server is considered healthy.

02

Recovery sequence fires

If the check fails, the daemon re-runs the full run.preflight run.start run.postflight sequence on that server — the same commands that ran during the original deployment.

03

Confirmation check

After the recovery sequence completes, the health check runs again to confirm the server is back online.


Recovery stages

The same three-stage sequence from your config

No additional configuration is required. The daemon reuses the run and healthcheck blocks already defined for each service.


Stage 1

Preflight

Re-runs preflight commands from config: rebuilds, reinstalls, migrations, or anything that must complete before the process starts.

Stage 2

Start

Re-runs start commands to bring the application process back up on the failed server.

Stage 3

Postflight

Re-runs postflight commands for verification, reloads, or any follow-up checks after the process is started.


Configuration

Nothing extra to configure

Auto-healing is enabled automatically on every deployment. The daemon reads the same config.yaml used at deploy time.


config.yaml
yaml
services: api: instances: 2 clusters: 2 run: preflight: - bun install --production start: - pm2 start dist/server.js --name api postflight: - pm2 list healthcheck: cmd: curl -s http://localhost:4000/health test: ok

Any server that crashes, runs out of memory, or encounters a transient error will be brought back to a healthy state automatically. The healthcheck, run.preflight, run.start, and run.postflight fields drive both the initial deployment and the recovery process.

Next step

Take the same setup across providers

Once the deployment shape is clear, duplicate it to another cloud provider with a single config override.

Continue to multi-cloud setup