AgentDNS (Implementation)
agentdns is the open-source Go binary that powers zns01.zynd.ai and (typically) any other production registry. It's a single process backed by PostgreSQL — and optionally Redis — that runs the HTTP API, the TCP gossip mesh, the Kademlia DHT, the search engine, and the trust calculator.
If Registry Spec is the protocol, this section is the implementation: which Go files own which responsibility, what tables look like, what background loops fire.
When to read this
- You're operating a self-hosted registry node.
- You're debugging gossip propagation, search ranking, or DHT lookups.
- You're contributing to the binary.
- You're porting another runtime that conforms to the same wire protocol.
What's in the binary
A single Go process. Everything ships in one binary; subsystems are wired in cmd/agentdns/main.go.
Subsystem map
| Layer | Source | Responsibility |
|---|---|---|
| HTTP API | internal/api | REST endpoints, Swagger docs, middleware (rate limits, CORS, logging), WebSocket heartbeat at /v1/entities/{id}/ws, activity stream at /v1/ws/activity |
| Mesh transport | internal/mesh/transport.go | TCP + TLS listener on port 4001, length-prefixed JSON wire protocol |
| Gossip protocol | internal/mesh/gossip.go | Dedup, signature verification, key pinning, hop counting, broadcast |
| Peer manager | internal/mesh/peers.go | Connection set, max-peer eviction, bloom-scored peer selection |
| Bootstrap | internal/mesh/bootstrap.go | Dial bootstrap peers with exponential backoff, reconnect loop |
| Bloom filters | internal/mesh/bloom.go | FNV double-hashing, periodic rebuild for query routing |
| DHT | internal/dht | Kademlia routing table, iterative lookup, republish/expire |
| Search engine | internal/search | BM25 keyword + semantic vectors + pluggable embedders |
| Ranking | internal/ranking | Weighted-linear and RRF scoring |
| Card fetcher | internal/card | HTTP fetch with .well-known fallbacks + LRU |
| Cache | internal/cache | Redis adapter (optional) |
| Identity | internal/identity | Ed25519 keypairs, derivation proofs, signing |
| Trust | internal/trust | EigenTrust calculator |
| Store | internal/store | PostgreSQL persistence (pgxpool) |
| Event bus | internal/events | In-process pub/sub fan-out |
Startup sequence
cmd/agentdns/main.go triggers the following on start:
- Load
~/.zynd/config.toml(or--configoverride). - Load node identity from
~/.zynd/identity.json(Ed25519 keypair generated byagentdns init). - Connect to PostgreSQL —
pgxpoolwith min 2 / max 20 connections, 30 min lifetime, 5 min idle. - Connect to Redis — optional; failures fail-open and the binary keeps running cache-less.
- Initialise the search engine with the configured embedder (
hash/onnx/http). - Create the peer manager and gossip handler.
- Create the EigenTrust calculator.
- Start the mesh transport (TCP listener on
:4001). - Wire federated search into the mesh.
- Initialise the DHT (if
[dht].enabled = true). - Start background loops (next section).
- Start the HTTP API server.
- Block on
SIGINT/SIGTERMfor graceful shutdown.
Shutdown drains in-flight requests, closes the mesh listener, and lets pgxpool finish active queries before exiting.
Background loops
Once boot completes, these tickers run for the lifetime of the process:
| Loop | Interval | What it does |
|---|---|---|
| DHT republish | 1 hour | Re-stores all locally-owned records at the K closest nodes |
| Mesh heartbeat | 30 s | Broadcasts a MsgHeartbeat with bloom filter + peer addresses |
| Peer reconnect | 30 s | Re-dials disconnected bootstrap peers (exponential backoff capped at 60 s) |
| Tombstone GC | 1 hour | Drops expired tombstones from tombstones and gossip_entries |
| Liveness sweep | 60 s | Marks active agents inactive if last_heartbeat < now - threshold (default 5 min) and gossips an agent_status announcement |
| Bloom rebuild | 5 min | Rebuilds the local bloom filter from all current local + gossip agents |
Wire protocol
The mesh transport is length-prefixed JSON over TCP+TLS:
[4 bytes big-endian length][JSON payload]- Max message size: 1 MB
- Write timeout: 10 s
- Read timeout: 90 s (3× heartbeat interval)
Message types:
| Type | Purpose |
|---|---|
MsgHello | Handshake — registry ID, public key, current agent count |
MsgHeartbeat | Periodic peer heartbeat with bloom filter + peer list |
MsgGossip | Announcement propagation |
MsgSearch | Federated search query |
MsgSearchAck | Federated search results |
MsgDHT | Kademlia STORE / FIND_VALUE / FIND_NODE / PING |
The handshake is two HELLOs carrying registry ID, public key, and current agent count; self-connections and duplicates are rejected.
Why TLS from Ed25519
The mesh uses self-signed TLS certificates derived from each node's Ed25519 key. CA trust is irrelevant on the mesh port — verification happens at the application layer in the HELLO handshake against keys learned from gossip, the registry identity proof, or DNS TXT. TLS 1.3 minimum.
The HTTP API on :8080 is the opposite: it uses CA-issued certs (Let's Encrypt is expected) because clients use TLS to verify the domain.
Identity — IDs from public keys
Every entity ID is derived deterministically:
agent ID = "zns:" + sha256(pubkey)[:16].hex()
service ID = "zns:svc:" + sha256(pubkey)[:16].hex()
developer ID = "zns:dev:" + sha256(pubkey)[:16].hex()For HD-derived agent keys, the formula is:
seed = SHA-512(dev_priv_key || "zns:agent:" || uint32_be(index))[:32]
agent_kp = Ed25519(seed)The verification logic in internal/identity reproduces both formulas to validate a developer_proof on POST /v1/entities.
Storage schema (PostgreSQL)
| Table | Purpose |
|---|---|
agents | Registry record per agent |
services | Registry record per service |
developers | Developer profiles |
handles | ZNS handle claims |
zns_names | ZNS name bindings (handle → entity) |
zns_versions | Version history of name bindings |
gossip_entries | Replication log for cross-node propagation |
tombstones | Deregistered entity markers; suppressed during gossip dedup |
peers | Currently-known peer addresses + public keys |
card_cache | Optional Postgres cache when Redis is unavailable |
Each table is owned by internal/store/<table>.go.
Card cache
internal/card fetches /.well-known/agent-card.json from each entity's entity_url:
- TTL: 1 hour.
- LRU bound: 1000 cards by default (configurable).
- Fall-back paths:
/agent-card.json,/.well-known/agent.json(for older entities),/well-known/agent.json. - Validates Ed25519 signature against the registry record's
public_key.
Internal event bus
Every subsystem publishes lifecycle events into an in-process pub/sub bus (internal/events/bus.go):
- Each subscriber gets a 256-buffered channel.
- Slow subscribers see drops; never backpressure.
- The WebSocket activity stream at
/v1/ws/activityis just one subscriber.
Categories: agent_*, gossip_*, search_*, peer_*, handle_*, name_*. Used for dashboards, metrics exporters, and tests.
CLI
The agentdns binary itself ships these subcommands (separate from the user-facing zynd CLI):
| Command | Purpose |
|---|---|
agentdns init | Generate node identity + write a default config.toml |
agentdns start | Run the registry process |
agentdns migrate | Run pending Postgres migrations |
agentdns peer add <addr> | Manually add a peer to the persistent peer list |
agentdns peer list | Print the current peer set |
agentdns trust list | Print top-N trust scores |
agentdns version | Build version |
The full operator's flow lives at Run a Registry Node.
Configuration
Config is a single TOML file (default ~/.zynd/config.toml). Top-level sections:
[server]
http_port = 8080
mesh_port = 4001
external_url = "https://my-registry.example.com"
[postgres]
url = "postgres://user:pass@localhost:5432/agentdns"
[redis]
url = "redis://localhost:6379"
[search]
embedder = "onnx"
onnx_model = "bge-small-en-v1.5"
[mesh]
bootstrap_peers = ["zns-boot.zynd.ai:4001"]
listen_port = 4001
max_peers = 64
[dht]
enabled = true
k = 20
alpha = 3
[onboarding]
mode = "open" # or "restricted"
auth_url = null
[heartbeat]
ttl_minutes = 5agentdns init generates a default file with sensible values.
Observability
| Surface | Endpoint |
|---|---|
| Health | GET /health |
| Node info | GET /v1/info |
| Network status | GET /v1/network/status |
| Network stats | GET /v1/network/stats |
| Peer list | GET /v1/network/peers |
| Live activity stream | WSS /v1/ws/activity |
| Prometheus (optional) | GET /metrics if [metrics].enabled = true |
For an operator's monitoring playbook see Metrics & Monitoring.
See also
- Registry Spec — what the protocol means.
- REST API — every endpoint exposed by this binary.
- Run a Registry Node — operator's guide.