Lessons on Building MCP Servers (5 minute read)

A practical guide to designing MCP servers that guide AI models through multi-step workflows by embedding breadcrumbs rather than expecting models to plan ahead.

What: This is a distilled guide from building production MCP servers (particularly an Office document manipulation server) that outlines techniques for making AI tool chains reliable—the core insight is that models don't plan sequences, they just pick the most probable next tool, so servers need to make the right path obvious at each step.

Why it matters: As developers build AI agent systems with tool use, understanding that models can't really plan multi-step workflows changes how you design APIs—you need to embed guidance in responses, use consistent naming, and provide structured breadcrumbs rather than expecting the model to figure out complex chains on its own.

Takeaway: When building tool interfaces for AI models, use consistent naming prefixes for related tools, return next-step hints in every response, provide discovery tools that return structured data instead of prose, and collapse similar tools into mode parameters rather than exposing dozens of separate functions.

Deep dive

Models don't have hidden planners—they scan available tools and pick whatever seems most probable based on conversation context, so servers must make the next call blindingly obvious at every step
The author's Office server exposes 100+ tools but funnels models toward 8 core verbs through instructions, treating specialized tools as fallback/diagnostic options to prevent five-call detours for one-call jobs
Consistent naming exploits probability: all Word tools are word_*, Excel tools excel_*, unified tools office_*—models that just called office_inspect will naturally reach for office_patch next because the prefix matches
Every tool response should include a breadcrumb dictionary with next_tools and usage hints showing exact call syntax—smaller models will copy these verbatim because it's the most likely token sequence
Discovery should be a callable tool like office_help(goal=...) that returns structured recommendations with rationale and next steps, not prose documentation—called with no arguments it returns the catalogue, with unknown input it returns the supported set instead of erroring
Use stable addressing like anchors, IDs, or structured paths instead of byte offsets or natural language descriptions that models lose between calls—if you return data the model has to describe back in natural language, your chain will misfire
Collapse similar tools into mode parameters (dry_run, best_effort, safe, strict) rather than separate tools—discovery cost scales with tool count not mode count, and models figure out escalation chains like dry_run → safe → strict on their own
Return standardized diagnostic envelopes with named fields like matched_targets and unmatched_targets that create branching points and recovery loops without forcing the model to re-read entire context
Always provide read-only introspection tools so confused models can "look again" without destructive consequences—the penalty becomes one extra round-trip instead of breaking files
The design checklist includes: pick 5-10 core verbs and name them in instructions, use consistent prefixes, embed forward breadcrumbs in responses, provide stable addresses, give mutation tools mode enums, cache recovery loop calls, make repeat calls safe, and reject unknown arguments strictly

Decoder

MCP (Model Context Protocol): A protocol for exposing tools and functions that AI models can call to interact with external systems and data sources
Activation sets: The subset of available tools that are surfaced to the model at any given time, keeping the visible tool list small while maintaining access to a larger set
Breadcrumbs: Structured hints embedded in tool responses that guide the model toward the next appropriate tool call in a workflow chain

Original article

I'll process the HTML you provided directly: ```html

Lessons on Building MCP Servers

I've been building MCP servers for a while now–I wrote about the general approach last year, started out by creating umcp, and I've recently opened up an Office server that's been battered by enough models against enough real documents that the patterns have settled.

I'm still not a fan of MCP, but what follows is what I've learned about making tool chains actually work, condensed from swearing at logs rather than reading papers.

Disclaimer: This is a condensed version of CHAINING.md, which was itself stapled together from a bunch of notes in my Obsidian vault. The full version has more code examples and a techniques inventory table that Opus just _had to add, and I've since beaten that out of it and restored most of the original text (minus typos).

The short version: the MCP servers I design do most of the work, while the model walks breadcrumbs.

Models don't plan

They look at the conversation, scan the tool list, and grab whatever looks more probable. That's it. There is no hidden planner. If you want chains that finish somewhere sensible, the server has to make the next call blindingly obvious at every step.

After a year or so, I have pared down my approach into these three things, roughly in order of how much pain they save you:

A small named core verb set covering most intents
Output that suggests the next call
An addressing scheme that survives between calls–anchors, IDs, paths, anything but line numbers.

Core verbs beat surface area

The Office server exposes over 100 tools. Its get_instructions() funnels models toward eight:

…start with office_help, then prefer office_read, office_inspect, office_patch, office_table, office_template, office_audit, and word_insert_at_anchor. Treat specialised tools as fallback, diagnostic, legacy-compatibility, or expert tools when the core flow is insufficient.

That single sentence does an outsized amount of work–it tells the model there is a recommended path, that the path is verb-shaped (help -> read -> inspect -> patch -> audit), and that everything else is opt-in.

Without it, models cheerfully reach for word_parse_sow_template when office_read would do, and you end up with five-call detours for one-call jobs.

So I quickly realized that I needed to be ruthless about which tools to surface and when. The specialised ones still ship–hidden under a "for experts" framing, and a handful of legacy ones filtered out of tools/list entirely.

I also make liberal use of activation sets–the surface the model sees is small; the surface it can reach is large.

Naming is the chain

Again, models chain whatever is most likely (or rhymes), and the most effective tactic, for me, has been taking advantage of that.

All Word tools are word_*, all Excel excel_*, all unified office_*. A model that just called office_inspect will reach for office_patch next, not word_patch_with_track_changes, because the prefix matches.

This particular server also makes liberal use of annotations and a little intent/inferrer hack that reads those prefixes to assign readOnlyHint/destructiveHint automatically, so naming discipline turns into safety metadata for free.

The prefix is the plan. The verb is the step. If you take one thing from this entire post, I'd suggest this notion…

Every response nominates the next call

This was the single change that made things behave on smaller models. The big ones will plan a chain from a tool list and a goal; the wee ones won't–they grab the first plausible tool and stop.

The fix is stupid simple: every response ends with a breadcrumb dictionary of hints to follow. At minimum next_tools: [...], plus usage: "<exact call>" whenever the current tool produced a value the next one needs.

A model that can't assemble arguments from a schema can copy the usage string verbatim. In fact, they will copy it, because it is still the most likely outcome as it fills in tokens, and thus those usage hints funnel the path the model takes.

Discovery as a tool, not documentation

Another thing I hit upon was that signposting needed to be curated.

Borrowing a page from intent mapping, office_help(goal=...) returns a structured record–recommended chain with rationale, fallbacks, diagnostic strings to watch for, one imperative next_step sentence. Not prose. Not a README, not skills. Data the model can act on without reading comprehension.

Called with no arguments, it returns the catalogue. Called with an unknown goal, it returns the supported set rather than an error, which turns a potential workflow-stopping error into an actual useful catalogue.

Addressing: anchors, not offsets

The biggest reason simple models can't follow chains is the model losing the thread between calls. "Insert a paragraph after the introduction" is fine in English but catastrophic if you expect it to remember a byte offset across three tool calls.

In this particular scenario, I cheated and since most Office documents have headings (or cells, or internal structured paths inside OOXML), I used either verbatim text from the document or immovable coordinates (which was particularly hard in PowerPoint, by the way).

So besides suggestions and hints, return identifiers your tools will later accept as input. If you find yourself returning data the model has to describe back to you in natural language, you've made a chain that will misfire on a Tuesday afternoon when you're not watching.

Modes turn one tool into four

I started out with individual editing tools per format, which was very easy to do automated tests for but incredibly wasteful of context, so at one point I decided to make things much simpler for initial discovery, and since I needed to make all outputs auditable, I then tagged available sub-operations risk-wise.

office_patch is the same code path whether you ask for dry_run, best_effort, safe, or strict. One tool, four modes, one entry in tools/list.

Discovery cost scales with tool count, not mode count. And dry_run -> safe -> strict is an escalation chain the model figures out on its own without being told.

If you have N tools that differ only in how cautious they are, collapse them. You're wasting everyone's context budget.

Diagnostics as the back-edge

Linear chains are easy. Real chains have loops, and loops only happen when the server invites the model back in. Every mutating tool returns a standard envelope with status, matched_targets, unmatched_targets, and next_tools.

The model then branches on a small subset of options "locally" without needing to go over the entire context, and if you name the diagnostic fields with exact strings the model will see again in your instructions, it will just reinforce them.

In this particular case, again, I cheated. I figured out that the models were starting to call tools at random because they couldn't introspect the document well enough and ended up breaking files, so I always gave them at least one read-only tool, so the penalty for "I'm confused, let me look again" is one extra round-trip, not a destructive cock-up.

My MCP Design Checklist

Pick five to ten core verbs and name them in get_instructions() or your local equivalent
Use consistent prefixes by surface
Provide a discovery tool that returns recommendations as data, not prose
Make the discovery tool browseable–no-arg returns the catalogue, unknown input returns the supported set
Embed forward breadcrumbs in every tool response
Provide a map/anchors tool so addresses survive between calls
Give every mutating tool a mode enum including dry_run
Return named diagnostic fields and cite the recovery tools
Standardise the mutation envelope. If one tool changes something in a specific way, make sure the others are consistent (arguments, semantics, etc.)
Reject unknown arguments strictly (this is much easier in some runtimes than others)
Provide an audit tool so the model has somewhere to land
Cache anything the recovery loop calls more than once, because, well, it will get called dozens of times even if you carefully curate paths through your tooling with hints.
Make repeat calls safe–models retry, and they should be allowed to (idempotence is hard, and often impossible).

Do the boring work in the schema and the descriptions. The model will happily do the clever bit if you stop making it guess.