Skip to content

Architecture & Startup

A registry node is a single Go process. On agentdns start it boots a stack of subsystems in a fixed order, then runs forever until SIGINT/SIGTERM.

Subsystem map

LayerSourceResponsibility
HTTP APIinternal/apiREST endpoints, Swagger docs, middleware (rate limits, CORS, logging), WebSocket heartbeat at /v1/entities/{id}/ws, activity stream at /v1/ws/activity.
Mesh transportinternal/mesh/transport.goTCP + TLS listener on port 4001, length-prefixed JSON wire protocol.
Gossip protocolinternal/mesh/gossip.goDedup, signature verification, key pinning, hop counting, broadcast.
Peer managerinternal/mesh/peers.goConnection set, max-peer eviction, bloom-scored peer selection.
Bootstrapinternal/mesh/bootstrap.goDial bootstrap peers with exponential backoff, reconnect loop.
Bloom filtersinternal/mesh/bloom.goFNV double-hashing, periodic rebuild for query routing.
DHTinternal/dhtKademlia routing table, iterative lookup, republish/expire.
Search engineinternal/searchBM25 keyword + semantic vectors + pluggable embedders.
Rankinginternal/rankingWeighted-linear and RRF scoring.
Card fetcherinternal/cardHTTP fetch with .well-known fallbacks + LRU.
Cacheinternal/cacheRedis adapter (optional).
Identityinternal/identityEd25519 keypairs, derivation proofs, signing.
Trustinternal/trustEigenTrust calculator.
Storeinternal/storePostgreSQL persistence (pgxpool).
Event businternal/eventsIn-process pub/sub fan-out.

Startup sequence

cmd/agentdns/main.go triggers the following on start:

  1. Load ~/.zynd/config.toml (or --config override).
  2. Load node identity from ~/.zynd/identity.json (Ed25519 keypair generated by agentdns init).
  3. Connect to PostgreSQL — pgxpool with min 2 / max 20 connections, 30 min lifetime, 5 min idle.
  4. Connect to Redis — optional; failures fail-open and the binary keeps running cache-less.
  5. Initialize the search engine with the configured embedder (hash / onnx / http).
  6. Create the peer manager and gossip handler.
  7. Create the EigenTrust calculator.
  8. Start the mesh transport (TCP listener on :4001).
  9. Wire federated search into the mesh.
  10. Initialize the DHT (if [dht].enabled = true).
  11. Start background loops (next section).
  12. Start the HTTP API server.
  13. Block on SIGINT / SIGTERM for graceful shutdown.

Shutdown drains in-flight requests, closes the mesh listener, and lets pgxpool finish active queries before exiting.

Background loops

Once boot completes, these tickers run for the lifetime of the process:

LoopIntervalWhat it does
DHT republish1 hourRe-stores all locally-owned records at the K closest nodes.
Mesh heartbeat30 sBroadcasts a MsgHeartbeat with bloom filter + peer addresses.
Peer reconnect30 sRe-dials disconnected bootstrap peers (exponential backoff capped at 60 s).
Tombstone GC1 hourDrops expired tombstones from tombstones and gossip_entries.
Liveness sweep60 sMarks active agents as inactive if last_heartbeat < now - threshold (default 5 min) and gossips an agent_status announcement.
Bloom rebuild5 minRebuilds the local bloom filter from all current local + gossip agents.

Wire protocol

The mesh transport is length-prefixed JSON over TCP+TLS:

[4 bytes big-endian length][JSON payload]
  • Max message size: 1 MB
  • Write timeout: 10 s
  • Read timeout: 90 s (3× heartbeat interval)

Message types — MsgHello, MsgHeartbeat, MsgGossip, MsgSearch, MsgSearchAck, MsgDHT. The handshake is two HELLOs carrying registry ID, public key, and current agent count; self-connections and duplicates are rejected.

Why TLS from Ed25519

The mesh uses self-signed TLS certificates derived from each node's Ed25519 key. CA trust is irrelevant on the mesh port — verification happens at the application layer in the HELLO handshake against keys learned from gossip, RIP, or DNS TXT. TLS 1.3 minimum.

The HTTP API on :8080 is the opposite: that one uses CA-issued certs (Let's Encrypt is expected) because clients use TLS to verify the domain.

Internal event bus

Every subsystem publishes lifecycle events into an in-process pub/sub bus (internal/events/bus.go):

  • Each subscriber gets a 256-buffered channel.
  • Slow subscribers see drops, never backpressure.
  • The WebSocket activity stream at /v1/ws/activity is just one subscriber.

Categories: agent_*, gossip_*, search_*, peer_*, handle_*, name_*. Used for dashboards, metrics exporters, and tests.

Next

Released under the MIT License.