Skip to content

Architecture

Two processes, one Postgres

The web service and the worker are independent systemd units that share state only through Postgres and the on-disk data directory:

ProcessOwnsTouches
zynd-deployer-web.serviceHTTP requests, uploads, dashboard renderingDeployment row inserts, encrypted blob writes, log SSE fan-out from DB
zynd-deployer-worker.serviceDocker, Caddy, port allocation, health, metrics, log tailing, retention GCDeployment row updates, PortAllocation, log + metric inserts

A web restart never disturbs running containers. A worker restart resumes mid-state-machine — the deployment table is the only durable state.

Upload pipeline

Everything sensitive is encrypted with age before it hits disk. The keypair never appears unencrypted on the host outside the worker's tmpfs scratch directory.

Worker loop

worker/main.ts runs three concerns:

drainQueue claims the oldest queued row by atomically updating status="starting", then hands it to lifecycle.drive(). The status update doubles as a lease — two workers running against the same DB will never claim the same row twice.

Deployment state machine

            ┌──────────────────────────────────────────────────────────────┐
            │                                                              │
queued ──► unpacking ──► writing_config ──► allocating_port ──► building   │
                                                                  │        │
                                                                  ▼        │
                                       starting ──► health_checking ──► registering_route


                                                                                  running

            ┌────────────────────────────────────────────────────────────────────────┤
            ▼                                                                        │
        unhealthy ◀── 3 consecutive /health failures                                 │
            │                                                                        │
            └─► running (recovers)                                                   │

        crashed   ◀── docker events: container died                                  │
        stopped   ◀── user clicked Stop in UI                                        │
        failed    ◀── any pre-running stage threw (validator, port, image, etc.)

Each transition is a single prisma.deployment.update() followed by a system-log row appended via appendSystemLog() — that's how the dashboard timeline stays in sync.

StateOwnerNotes
queuedwebInserted by upload handler.
unpackingworkerage -d blobs/<id>.zip.age → unzip into workdirs/<id>/.
writing_configworkerruntime.ts injects ZYND_ENTITY_URL, ZYND_REGISTRY_URL, ZYND_WEBHOOK_PORT into .env and agent.config.json.
allocating_portworkerports.allocate() — atomic insert into PortAllocation for an unused port in [13000, 14000].
buildingworkerruntime-specific — pip install for Python, pnpm install for Node.
startingworkerdocker run with mem/CPU limits, port binding to 127.0.0.1:<port>.
health_checkingworkerPoll http://127.0.0.1:<port>/health up to 30 times with 1 s spacing.
registering_routeworkerCaddy admin API call to add <slug>.deployer.<wildcard>127.0.0.1:<port>.
runningworkerSteady state.
unhealthyworker3 consecutive /health failures while running. Container untouched — recovers if probes pass again.
crashedworkerCaptured by crash.ts watching docker events. lastExitCode populated.
stoppedworkerUser clicked Stop. Container, route, port released.
failedworkerAny unhandled error before running. errorMessage captured.

Failures after running go to crashed or unhealthy, never failedfailed is reserved for never-started.

Encryption pipeline

src/lib/crypto.ts is a thin wrapper around the age CLI:

ts
spawn("age", ["-r", recipient, "-o", outPath], stdin = blob)
spawn("age", ["-d", "-i", AGE_IDENTITY_PATH], stdin = encryptedBytes)
  • Master keymaster.age generated by infra/install.sh, lives at AGE_IDENTITY_PATH. Owned by the zynd system user, mode 600.
  • Recipient — same key (used as the public recipient for encryption). Symmetric in effect.
  • Format — stock age — interoperable with anything else operators already know.

Why shell out instead of binding a JS library? Operators can verify, decrypt, and rotate blobs with the standard age binary they already trust.

Caddy integration

worker/caddy.ts talks to Caddy's admin API at CADDY_ADMIN_URL (default http://127.0.0.1:2019):

  • ensureServer() — confirms the wildcard server config exists at startup.
  • addRoute(slug, port) — POSTs a route block matching <slug>.deployer.<wildcard> to a reverse_proxy 127.0.0.1:<port>.
  • removeRoute(slug) — DELETEs the route.

TLS is wildcard via DNS-01 against Cloudflare (infra/Caddyfile), so per-tenant slugs need zero per-deployment cert work.

/api/caddy/ask handles Caddy's on-demand TLS hook — when a request arrives for a slug Caddy hasn't seen, it asks the deployer "is this slug live?" before issuing a cert.

Crash detection

worker/crash.ts runs once at startup and never returns:

ts
docker.events({ filters: { event: ["die"] } })
  .on("data", chunk => {
    const evt = JSON.parse(chunk.toString());
    const id = evt.Actor.Attributes.deploymentId;
    markCrashed(id, evt.Actor.Attributes.exitCode);
  });

Each container is launched with --label deploymentId=<cuid>, so the watcher knows which row to update. Detection is sub-second — much faster than waiting for the next health probe.

Failure isolation

A crash in the worker doesn't kill running containers (Docker keeps them alive). On worker restart:

  1. drainQueue resumes any queued row.
  2. The crash watcher reattaches to docker events.
  3. Health, metrics, log tailers re-bind to all running containers.

This is why we keep the lifecycle state in the DB rather than in-memory.

Next

Released under the MIT License.