Architecture & Startup
A registry node is a single Go process. On agentdns start it boots a stack of subsystems in a fixed order, then runs forever until SIGINT/SIGTERM.
Subsystem map
| Layer | Source | Responsibility |
|---|---|---|
| HTTP API | internal/api | REST endpoints, Swagger docs, middleware (rate limits, CORS, logging), WebSocket heartbeat at /v1/entities/{id}/ws, activity stream at /v1/ws/activity. |
| Mesh transport | internal/mesh/transport.go | TCP + TLS listener on port 4001, length-prefixed JSON wire protocol. |
| Gossip protocol | internal/mesh/gossip.go | Dedup, signature verification, key pinning, hop counting, broadcast. |
| Peer manager | internal/mesh/peers.go | Connection set, max-peer eviction, bloom-scored peer selection. |
| Bootstrap | internal/mesh/bootstrap.go | Dial bootstrap peers with exponential backoff, reconnect loop. |
| Bloom filters | internal/mesh/bloom.go | FNV double-hashing, periodic rebuild for query routing. |
| DHT | internal/dht | Kademlia routing table, iterative lookup, republish/expire. |
| Search engine | internal/search | BM25 keyword + semantic vectors + pluggable embedders. |
| Ranking | internal/ranking | Weighted-linear and RRF scoring. |
| Card fetcher | internal/card | HTTP fetch with .well-known fallbacks + LRU. |
| Cache | internal/cache | Redis adapter (optional). |
| Identity | internal/identity | Ed25519 keypairs, derivation proofs, signing. |
| Trust | internal/trust | EigenTrust calculator. |
| Store | internal/store | PostgreSQL persistence (pgxpool). |
| Event bus | internal/events | In-process pub/sub fan-out. |
Startup sequence
cmd/agentdns/main.go triggers the following on start:
- Load
~/.zynd/config.toml(or--configoverride). - Load node identity from
~/.zynd/identity.json(Ed25519 keypair generated byagentdns init). - Connect to PostgreSQL —
pgxpoolwith min 2 / max 20 connections, 30 min lifetime, 5 min idle. - Connect to Redis — optional; failures fail-open and the binary keeps running cache-less.
- Initialize the search engine with the configured embedder (
hash/onnx/http). - Create the peer manager and gossip handler.
- Create the EigenTrust calculator.
- Start the mesh transport (TCP listener on
:4001). - Wire federated search into the mesh.
- Initialize the DHT (if
[dht].enabled = true). - Start background loops (next section).
- Start the HTTP API server.
- Block on
SIGINT/SIGTERMfor graceful shutdown.
Shutdown drains in-flight requests, closes the mesh listener, and lets pgxpool finish active queries before exiting.
Background loops
Once boot completes, these tickers run for the lifetime of the process:
| Loop | Interval | What it does |
|---|---|---|
| DHT republish | 1 hour | Re-stores all locally-owned records at the K closest nodes. |
| Mesh heartbeat | 30 s | Broadcasts a MsgHeartbeat with bloom filter + peer addresses. |
| Peer reconnect | 30 s | Re-dials disconnected bootstrap peers (exponential backoff capped at 60 s). |
| Tombstone GC | 1 hour | Drops expired tombstones from tombstones and gossip_entries. |
| Liveness sweep | 60 s | Marks active agents as inactive if last_heartbeat < now - threshold (default 5 min) and gossips an agent_status announcement. |
| Bloom rebuild | 5 min | Rebuilds the local bloom filter from all current local + gossip agents. |
Wire protocol
The mesh transport is length-prefixed JSON over TCP+TLS:
[4 bytes big-endian length][JSON payload]- Max message size: 1 MB
- Write timeout: 10 s
- Read timeout: 90 s (3× heartbeat interval)
Message types — MsgHello, MsgHeartbeat, MsgGossip, MsgSearch, MsgSearchAck, MsgDHT. The handshake is two HELLOs carrying registry ID, public key, and current agent count; self-connections and duplicates are rejected.
Why TLS from Ed25519
The mesh uses self-signed TLS certificates derived from each node's Ed25519 key. CA trust is irrelevant on the mesh port — verification happens at the application layer in the HELLO handshake against keys learned from gossip, RIP, or DNS TXT. TLS 1.3 minimum.
The HTTP API on :8080 is the opposite: that one uses CA-issued certs (Let's Encrypt is expected) because clients use TLS to verify the domain.
Internal event bus
Every subsystem publishes lifecycle events into an in-process pub/sub bus (internal/events/bus.go):
- Each subscriber gets a 256-buffered channel.
- Slow subscribers see drops, never backpressure.
- The WebSocket activity stream at
/v1/ws/activityis just one subscriber.
Categories: agent_*, gossip_*, search_*, peer_*, handle_*, name_*. Used for dashboards, metrics exporters, and tests.
Next
- Identity Layer — how IDs are derived from public keys.
- Storage Schema — what each subsystem writes to PostgreSQL.
- Gossip Mesh — full propagation pipeline.