Technical ArchitectureApril 16, 2026 · 12 minutes read

Designing Connector Schemas AI Can Reason About

By CorpusIQ LLC

The bad news about connector-building in the MCP era: most API-wrapper work is still manual. The good news: the thinking has gotten more interesting.

In the Zapier era, wrapping an API was mechanical. Map endpoints to actions. Map fields to inputs. Ship it. The connector worked because a human wired it into a workflow and told it exactly what to do.

In the MCP era, the connector has to work because an AI reads its schema and decides when to call it. The AI does not get told. It reads the schema and reasons.

That reasoning succeeds or fails based entirely on how the schema is written. A connector with a mediocre schema will be misrouted, underused, or ignored by the AI. A connector with a well-designed schema will get called correctly even on ambiguous queries. Here is what separates the two.

What a minimal MCP schema looks like

The MCP specification requires very little. A tool needs a name, a description, and an input schema.

{
  "name": "get_shopify_order_summary",
  "description": "Get summary of recent orders",
  "inputSchema": {
    "start_date": "string",
    "end_date": "string"
  }
}

This is technically compliant. It will also get misrouted constantly. "Get summary of recent orders" is vague enough that the AI will fail to call it when asked "how much did we sell this month?" The vocabulary does not overlap. The description is too thin to bridge the gap. Spec compliance is not the goal. Reasoning readiness is.

The reasoning-ready schema

Here is the same connector with fields that actually help an AI route correctly.

{
  "name": "get_shopify_order_summary",
  "description": "Summarize Shopify orders over a date range. Returns revenue totals, order count, average order value, and top products. Answers questions about sales performance, revenue trends, and ecommerce activity.",
  "intent_tags": ["revenue", "sales", "orders", "ecommerce", "trend", "performance"],
  "answers_questions_like": [
    "How much did we sell this month?",
    "What is our revenue this quarter?",
    "Why is revenue dropping?",
    "What is our average order value?",
    "What are our top-selling products?"
  ],
  "complementary_tools": [
    "get_klaviyo_campaign_revenue",
    "run_ga4_report",
    "get_quickbooks_profit_loss"
  ],
  "returns": "Revenue total in USD, order count, average order value, top 5 products by revenue, date range metadata",
  "inputSchema": {
    "start_date": "ISO 8601 date, inclusive. Defaults to 30 days ago.",
    "end_date": "ISO 8601 date, inclusive. Defaults to today."
  }
}

The difference is not the code. The difference is that an AI reading this schema has enough information to route correctly across a dozen different phrasings of the same underlying question.

Field by field rationale

description: not a summary, a router hint. The description field is what the AI reads first. It has to do three jobs: identify the system (Shopify), identify the operation (summarize orders), and signal what kinds of questions the tool answers. The pattern that works: system name, operation verb, what the output enables. What to avoid: descriptions that explain how the tool works internally, describe parameters instead of operations, or use vocabulary only engineers use.

intent_tags: the semantic routing vocabulary. Not part of the MCP spec. Not standardized. Use it anyway. Intent tags are a compact list of domain words that the AI can match against a user's query. They let the routing layer do a first-pass filter before semantic reasoning. Keep them broad, not narrow. "ecommerce" is more useful than "shopify_admin_api." Six to ten tags per tool is the sweet spot.

answers_questions_like: the query exemplars. This is the field that does the most work per character. Five to ten concrete example questions that the tool can answer. Written the way a user would actually ask, not the way the API documentation phrases it. Include multiple phrasings of the same underlying question. "How much did we sell this month?" and "What is our revenue this quarter?" are the same question with different vocabulary. Including both trains the AI that either phrasing maps to this tool. Also include question forms that indicate the need for this tool obliquely. "Why is revenue dropping?" does not literally ask for an order summary, but it requires one.

complementary_tools: the cross-system hint. This field is underappreciated. It is what turns a directory of tools into a reasoning graph. When a user asks "why did revenue drop last month?" the answer is rarely in one system. The Shopify order summary needs to be correlated with Klaviyo campaign data, GA4 sessions, and QuickBooks. The complementary_tools field teaches the router that when get_shopify_order_summary is called, these other tools are frequently useful in the same request. Three to five complementary tools per connector is a good target.

returns: the output spec in plain language. The input schema tells the AI what to send. The output needs a parallel field. returns describes what comes back, in plain language. A description of what the AI will receive, so the AI knows whether this tool produces what the user needs. Without returns, the AI has to guess. With returns, the AI can route correctly.

inputSchema: parameter docs, but in English. The MCP spec requires typed input parameters. What the spec does not require is that those parameter descriptions be understandable. Default values matter. Many user queries do not specify dates. "Show me our revenue" implies some reasonable default window. Specifying defaults removes the ambiguity.

The connector pairs that need special attention

Some connector pairs in a large platform have overlapping semantic domains. These need disambiguation in the schema or they will misroute.

Shopify orders vs QuickBooks invoices. Both represent revenue events. The Shopify tool covers ecommerce transactions. The QuickBooks tool covers accounting-recognized revenue after reconciliation.

Gmail vs Slack. Both are communication channels. Schemas need to specify what kinds of communication live where.

Google Calendar vs HubSpot deals. Both contain meeting-like information. Calendar has the events. HubSpot has the sales context.

If your connector catalog has these kinds of pairs without disambiguation in the schemas, the AI will misroute predictably.

What happens when you skip all of this

We have run side-by-side evaluations of minimal schemas versus reasoning-ready schemas on identical connector sets. Same tools, same underlying APIs, same host LLM. Only the schemas differ.

The minimal-schema version misroutes approximately one in four queries. The reasoning-ready version misroutes closer to one in twenty queries. The difference compounds. In a session with ten queries, the minimal-schema version produces roughly two wrong answers. The reasoning-ready version produces half a wrong answer on average.

Users feel the difference immediately. One product feels smart. The other feels broken.

The operational commitment

Schema design is not a one-time task. It is a living artifact that degrades without maintenance. Three operational habits keep this under control.

Schema validation in CI. Every PR that touches a connector runs a schema-to-implementation check. If the schema claims a field that the code does not return, CI fails. This is cheap to automate and prevents 90% of drift.

Quarterly schema review. Once a quarter, review the answers_questions_like field on every tool against the actual queries users sent. If users are asking things the exemplars do not cover, update the exemplars.

A connector retirement path. When a connector gets deprecated, its schema needs to go too. Leaving deprecated tools in the registry is the single most common cause of routing degradation in mature platforms.

The bigger picture

The integration platforms that win the next five years will not be the ones with the most connectors. They will be the ones whose connectors are best at being reasoned about.

Connector count is a supply metric. Schema quality is a product metric. The two are not correlated. Some platforms will have thousands of connectors and mediocre schemas. They will feel broken. Some will have dozens of connectors and excellent schemas. They will feel magical.

The platforms that compete on schema quality, not connector count, are building durable advantage.

Try CorpusIQ Free

Connect your first tool in under 2 minutes

30-day free trial. Cancel anytime. All 37+ connectors included.

Start free trial →

{ "name": "get_shopify_order_summary", "description": "Summarize Shopify orders over a date range. Returns revenue totals, order count, average order value, and top products. Answers questions about sales performance, revenue trends, and ecommerce activity.", "intent_tags": ["revenue", "sales", "orders", "ecommerce", "trend", "performance"], "answers_questions_like": [ "How much did we sell this month?", "What is our revenue this quarter?", "Why is revenue dropping?", "What is our average order value?", "What are our top-selling products?" ], "complementary_tools": [ "get_klaviyo_campaign_revenue", "run_ga4_report", "get_quickbooks_profit_loss" ], "returns": "Revenue total in USD, order count, average order value, top 5 products by revenue, date range metadata", "inputSchema": { "start_date": "ISO 8601 date, inclusive. Defaults to 30 days ago.", "end_date": "ISO 8601 date, inclusive. Defaults to today." } }