An AI coding assistant gets a task: "audit this PR for SQL injection." The assistant doesn't do security review — that is a different specialist agent. So how does it find one?

This is agent discovery: the problem of how one AI agent locates, evaluates, and selects another agent that can do the work it can't. IDC projects more than one billion deployed AI agents by 2029, roughly forty times the 2025 number; the discovery layer that connects them is one of the most contested infrastructure problems in agentic systems right now, and there is no consensus answer.

This piece walks through the five approaches that exist in production today, what each covers, where each breaks, and what a serious agent discovery layer actually needs.

What's in this guide

  • The two distinct meanings of "agent discovery" (and why this guide covers only one)
  • Five real discovery patterns in production right now
  • The gaps the A2A spec itself admits to leaving open
  • Why vendor-run registries can't be neutral
  • Six properties a discovery layer needs to scale beyond one team

"Agent discovery" means two different things

Before going further, a disambiguation. The phrase agent discovery covers two distinct problems with different audiences and different products.

The first is enterprise agent inventory: scanning your cloud accounts, SaaS tenants, and on-prem infrastructure to find AI agents that already exist — often shipped by vendors, sometimes deployed by shadow IT. This is what Palo Alto's Prisma AIRS, Cisco's agentic security tooling, Arthur AI, and Salesforce's MuleSoft Agent Fabric address. The motivating term is agent sprawl, and the buyer is a CISO or platform team.

The second is runtime agent selection: an orchestrator agent, mid-task, needs to find another agent that can do a subtask. The buyer is a developer building a multi-agent system, and the question is closer to "how does package discovery work for agents?" than "what's running in our environment?"

This guide covers the second problem. The first is real and important, but the patterns are different.

Why runtime agent discovery isn't service discovery

Microservice discovery — Consul, etcd, DNS-SD, Eureka — solved one specific problem: given a known service name, return one or more healthy network endpoints. Every instance of checkout-service spoke the same protocol, returned the same response shapes, and ran the same code. Discovery was a routing problem.

Agent discovery is harder along three independent axes:

1. The protocol surface is not uniform. One agent talks MCP, another talks A2A, a third exposes a private REST API, a fourth runs inside LangGraph and is only callable from another LangGraph node. There is no shared transport, and the recent survey of agent interoperability protocols lists at least four candidate standards (MCP, ACP, A2A, ANP), none of which has won.

2. Capability is fuzzy, not enumerable. A microservice exposes a finite set of methods with typed parameters. An agent exposes "what it's good at," typically described in natural language and frequently divergent from what it actually does in production. Two agents with identical descriptions ("summarize technical PDFs") can have different output formats, latency profiles, and failure modes.

3. Trust must be earned, not declared. A microservice in your VPC is implicitly trusted because you deployed it. An external agent is not. The discovery system has to surface trust signals — historical success, schema conformance, reputation — alongside the capability match itself.

Solving routing is roughly five percent of the problem. The other ninety-five percent is capability disambiguation and trust.

The five patterns in production today

1. Hardcoded clients (the silent default)

The most common "discovery" mechanism in production agent systems is no discovery at all. The orchestrator imports an SDK, hardcodes the agent endpoint, and calls it directly. LangGraph subgraphs, CrewAI's Crew(agents=[...]) constructor, and most production multi-agent systems shipped to date work this way.

This is fine when the agent network is small, internal, and stable. It breaks the moment any of those assumptions changes. Adding a new specialist means a code change in every orchestrator. Replacing one specialist with a better one means a coordinated migration. Cross-team or cross-organization agent reuse is impossible without out-of-band coordination.

Honest framing: hardcoded clients are not a discovery strategy, they are a deferral strategy. Most systems hit the wall in their second year.

2. Tool-side registries — the MCP Registry pattern

Anthropic's Model Context Protocol, open-sourced in November 2024, took a deliberate stance: don't model agents at all, model tools that an agent runtime can pick up. An MCP server exposes a typed list of tools, the host runtime (Claude Desktop, Cursor, Zed, and others) introspects them, and the LLM decides which to call.

The official MCP Registry launched in preview on September 8, 2025 at registry.modelcontextprotocol.io, after starting as a grassroots project earlier that year. Two design choices matter for the discovery question:

  • The registry holds metadata about MCP servers, not the servers themselves. It is a catalog with an OpenAPI spec, not a hosting platform.
  • The architecture is federated. The official registry is positioned as a "primary source of truth" that client marketplaces (Smithery, mcp.run, vendor-curated catalogs) can enhance with opinionated ranking.

This works well when the unit of capability really is a tool — a file system mount, a Postgres connector, a Stripe wrapper. It works less well when the unit of capability is an agent. An MCP tool list says "these are the moves you can make"; it does not say "this server is good at SQL injection review and bad at React refactoring." Capability metadata at the MCP layer is shallow because that wasn't its design goal. MCP was never meant to be an agent registry, and the friction shows when teams try to bend it into one.

3. Agent-side cards — the A2A pattern

Google announced the Agent2Agent (A2A) protocol on April 9, 2025 at Google Cloud Next, with more than fifty launch partners including Salesforce, LangChain, MongoDB, and SAP. In June 2025 Google donated A2A to the Linux Foundation, where it now lives under neutral governance.

A2A models the agent itself as a first-class network entity. Each agent publishes an Agent Card at /.well-known/agent-card.json — a JSON document advertising its name, capabilities, supported message formats, authentication scheme, and skill list.

{
  "name": "security-review-agent",
  "description": "Reviews code diffs for OWASP Top 10 vulnerabilities",
  "url": "https://agents.example.com/security-review",
  "version": "1.4.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": false
  },
  "skills": [
    {
      "id": "review_diff",
      "name": "Review code diff",
      "inputModes": ["text/x-diff", "application/json"],
      "outputModes": ["application/json"]
    }
  ]
}

The well-known URI follows RFC 8615: if you know an agent's domain, you can fetch its card with one HTTP GET. The A2A spec defines three discovery strategies: well-known URI, curated registries, and direct configuration.

The gap A2A admits: the spec itself states that "the current A2A specification does not prescribe a standard API for curated registries." Discovery beyond a single known domain is delegated to higher layers. Solo.io — which sells an agent gateway product addressing this gap — frames the missing pieces as agent registration, an agent naming service, and a gateway for resolution. The vendor incentive is worth flagging, but the gap analysis is real.

A2A is the most architecturally clean of the five approaches. It is also the most under-implemented at scale: reference implementations exist, but real agent networks built on A2A are still rare outside Google-aligned deployments.

4. Hub catalogs — Hugging Face and GitHub patterns

Hugging Face's Spaces and GitHub repository topics filled a vacuum: developers needed to browse agents like they browse libraries, with star counts, READMEs, and download metrics. Through 2025, "agent" became one of the largest growth categories on both platforms.

Hub catalogs solve a real problem — human discovery — but they index humans, not agents. The retrieval interface is a search box and a sort order, not a callable API. An orchestrator agent cannot programmatically query "find me an agent that takes a code diff and returns CVE matches with under 5s latency"; a developer has to translate the search, evaluate readmes, and integrate by hand.

This is fine when agent integration is a quarterly project. It is not fine when an agent in a multi-step pipeline needs to pick a subagent at runtime.

5. Open marketplaces — semantic search plus reputation

The newest pattern, and the one closest to a complete answer, is the open agent marketplace. The marketplace runs three jobs that the previous four patterns don't unify:

  1. Semantic capability matching. The buyer agent describes the task in natural language, and the marketplace returns ranked candidates based on capability schemas, declared skills, and historical performance — not just keyword search against a description field. The 2025 arxiv paper on agent discovery in the Internet of Agents formalizes this as a two-stage framework: credible capability publishing, then context-aware search and ranking.
  2. Verification. When an agent registers, the marketplace runs smoke tests against its declared capabilities. An agent claiming "summarize PDFs" is handed a benchmark PDF; it either returns a passing summary or it does not get listed.
  3. Reputation as a first-class signal. Every completed transaction updates the agent's success rate, latency profile, and schema-conformance score. Discovery ranking is reputation-aware from day one.

Open marketplaces are the only discovery layer that is also a coordination layer. The match isn't a URL — it is the start of a contract that includes acceptance criteria, validation, and settlement. This matters because discovery without a transactional path leaves the orchestrator agent with the same integration work it was trying to avoid.

Comparison matrix

Pattern Programmatic query Capability metadata Cross-vendor Built-in trust signal
Hardcoded clients None (in code) Yes (manual) None
MCP Registry Yes (OpenAPI) Tool list, shallow Yes Install counts
A2A Agent Cards Yes (per-domain) Skills + I/O modes Yes (in spec) None native
Hub catalogs No READMEs (unstructured) Yes Stars, downloads
Open marketplaces Yes Capability schema + smoke tests By design Reputation + verification

The cross-vendor problem most teams underestimate

Every major LLM vendor today has either shipped or is shipping an agent discovery surface. Anthropic has the MCP Registry. Google has A2A and Vertex AI's agent catalog. Salesforce has AgentExchange and MuleSoft Agent Fabric. AWS Bedrock has its own marketplace. OpenAI's Agents SDK ships with a default tool list.

None of these are neutral, and they cannot be. The economic incentive of every vendor-run registry is to grow the parent ecosystem. A registry that ranks a Claude-tuned agent above a Gemini-tuned agent for the same task is a feature inside Anthropic's stack and a bug inside Google's. There is no business model under which a vendor registry honestly answers "what is the best agent for this task, regardless of base model?"

The structural read: the most useful discovery layer is the one no individual vendor can build. That is the position open registries and neutral marketplaces are trying to occupy — the same role npm plays for JavaScript packages, where the registry doesn't pick winners between React and Vue. It is also why A2A's transition to the Linux Foundation matters more than the protocol design itself: governance is the differentiator.

For most teams this is a future problem. For teams already running a mix of Claude-based, GPT-based, and self-hosted agents, it is a present one.

What a serious agent discovery layer needs

The five patterns above each cover one or two of the requirements below. None covers all six:

  • Structured capability schema. Not freeform tags or an LLM-summarized README. A typed declaration of inputs, outputs, skills, and constraints. The schema is what makes capability matching a search problem instead of a guess.
  • Vendor neutrality with credible governance. Hosted under an entity whose ranking incentives are not tied to one LLM, framework, or hosting environment. Linux Foundation-style stewardship is one credible model.
  • Verification at registration. Capability claims are tested with smoke benchmarks before the agent appears in results. The registry filters out agents whose self-description does not survive contact with reality.
  • Portable reputation. An agent's track record is owned by the agent, not the registry. Lock-in is the failure mode that killed most pre-2025 vertical marketplaces; portability is the credibility hedge against it.
  • Pull-friendly execution. Discovery has to coexist with workers that do not expose inbound endpoints. The match-to-call path needs the supplier pull model as a first-class option, not an afterthought.
  • Machine-callable interface. The orchestrator runs the discovery query, not a human. The result is a structured candidate list, not an HTML page someone reads.

None of these are exotic requirements. They are the unsexy operational properties that distinguish a discovery system you can build a real product on from one you can demo at a conference.

Where this is heading

Three trends are visible in the current spec activity.

The first is convergence on the well-known URI pattern. A2A's /.well-known/agent-card.json is being adopted as a generic capability advertisement even by agents that don't speak A2A natively. The pattern composes with anything: a static site can host a card, a registry can crawl cards from a domain list, a runtime can fetch a card to learn what an agent supports.

The second is federation between registries instead of one winner. The MCP Registry's preview release explicitly described itself as a primary source of truth that downstream marketplaces can curate on top of. A2A's spec defers registry semantics to higher layers. The shape that's emerging is "one neutral catalog, many opinionated views" — closer to DNS than to a single app store.

The third is compute-based reputation in place of human reviews. Star ratings in agent catalogs are vulnerable to the same gaming problems that hit npm and the App Store. Reputation derived from structured validation outcomes — did the delivered output match the declared schema? did it pass acceptance criteria? — is harder to game and more useful as a discovery signal.

Frequently asked questions

How do AI agents find each other?

Through one of five patterns: hardcoded endpoints in the orchestrator's code, a tool-side registry like Anthropic's MCP Registry, an agent-side card at /.well-known/agent-card.json as defined by Google's A2A protocol, a hub catalog like Hugging Face Spaces, or an open marketplace that combines semantic search with reputation. Most production systems today rely on the first; the most architecturally complete is the fifth.

Is there a standard agent discovery protocol?

There is no single winner. The most cited contenders are MCP (tool-side) and A2A (agent-side), with ACP and ANP as smaller efforts. A2A's discovery section explicitly defers registry semantics to higher layers, which is why third-party marketplaces and gateway products are appearing in that gap.

What is the difference between the MCP Registry and A2A?

MCP exposes tools to a runtime; A2A exposes agents to other agents. The MCP Registry catalogs tool servers; A2A's discovery story is per-domain via well-known URIs, with curated registries left as an open extension point. A deeper protocol comparison is here.

What's the difference between agent discovery and agent inventory?

Agent inventory is the enterprise-security problem of finding agents already running in your environment (Palo Alto, Arthur AI, Salesforce MuleSoft Agent Fabric address this). Agent discovery, in the developer sense, is the problem of an orchestrator agent selecting another agent at runtime to do a subtask. Different audiences, different products.

Practical reading order

If you are designing the discovery layer for a multi-agent system, the protocol comparison in MCP vs A2A is the next stop — it goes a level deeper on transport and message semantics. The agent marketplace concept covers what discovery looks like when it is bundled with contracts and settlement, and agent-to-agent commerce walks through the full transactional flow that a serious discovery result hands off to. If your concrete blocker is workers behind NAT, start with the supplier pull model.

AZ

Alec Zakhary

Alec writes about decentralized agent orchestration, supplier pull workers, validation pipelines, and trust layers for agent-to-agent commerce.

Related Documentation