How many tokens do your MCP servers burn before any work starts?
See what your MCP servers cost before your agent does any work. Every server you connect loads its full tool list into the model’s context and resends it on every call, including the tools a task never touches. Pick the servers you run to see the overhead, what it costs, and what Code Mode removes.
Last measured 2026-06-09 · 7 servers · counted with o200k_base · method & data
Which model are you running?
Which MCP servers does it connect to?
Select every server your agent connects to.
Loaded into context, before any work
Pick the servers above to see what they cost and how Code Mode compares.
Measured token weight by MCP server
Tool-definition tokens for 7 official and reference servers, heaviest first, each run locally at the version shown and counted 2026-06-09 with o200k_base. Counts are the server’s default startup.
| MCP server | Vendor | Tools | Tokens |
|---|---|---|---|
Notion @notionhq/notion-mcp-server@2.2.1 | Notion | 22 | 15,462 |
Sentry @sentry/mcp-server@0.36.0 | Sentry | 22 | 13,823 |
Supabase @supabase/mcp-server-supabase@0.8.2 | Supabase | 29 | 3,422 |
Playwright @playwright/mcp@0.0.75 | Microsoft | 23 | 3,130 |
Filesystem @modelcontextprotocol/server-filesystem@2026.1.14 | MCP reference | 14 | 1,641 |
AWS API awslabs.aws-api-mcp-server@1.3.43 | AWS Labs | 2 | 1,554 |
Git mcp-server-git@2026.6.4 | MCP reference | 12 | 1,119 |
How these numbers are measured
Each server is run locally over stdio at the version shown, and its live tools/list response is captured, the exact definitions a client loads into the model. Every tool is serialized as its name, description, and JSON-Schema parameters and counted with tiktoken’s o200k_base encoder. A model’s own tokenizer differs by roughly 10 to 20 percent.
The count is each server’s default startup. Some servers expose more tools behind flags or a connected account, so the figure here is what loads out of the box. Each row lists the exact package and version, so anyone can run the same server and get the same number.
This pass covers official and reference servers that run locally as open source. Remote, OAuth-only servers such as Linear, Atlassian, and the hosted GitHub server are not included yet: measuring them honestly needs a real authenticated connection rather than a guess.
Why tool definitions cost tokens
A language model keeps no memory between calls. It re-reads its entire context every time it runs. Tool definitions are part of that context. Every tool a server exposes, including its name, description, and full JSON-Schema parameters, is re-sent on every request, whether or not the tool is used.
The cost grows in two ways. Connect several servers and tens of thousands of tokens of schemas load before the first instruction. Then each tool result re-enters the context as the task runs and stays there. The first is a fixed entry cost. The second builds up over a task, the subject of context engineering.
This is a footprint problem first and a cost problem second. Prompt caching makes re-reading the definitions cheaper, but the tokens still occupy the window on every call. A fuller window leaves less room for the actual work and reaches rate limits sooner.
What reduces it
- Connect fewer servers
- Load only the servers a task needs. Unused tools still cost their full weight on every call.
- Filter or lazy-load tools
- Expose a subset of a server’s tools, or load definitions on demand. A proxy in front of several servers can do the same thing.
- Prompt caching
- Bills the unchanged, re-sent prefix at a fraction of the rate. It is the easiest change to make and needs no new architecture. It lowers the cost of re-reading the definitions, not their footprint in the window.
- Programmatic tool-calling (Code Mode)
- The model writes one program against a compact typed interface, so definitions load once and intermediate results stay in the runtime. On multi-tool, large-result work, Anthropic reported a task dropping from about 150,000 tokens to roughly 2,000. See how it works.
Common questions
- What is an MCP server?
- An MCP server is a program that exposes a set of tools to an AI model over the Model Context Protocol, the open standard for connecting models to outside systems. Each tool carries a name, a description, and a JSON-Schema definition of its inputs. Those definitions are what load into the model context and cost tokens, which is what this page measures.
- Why do MCP servers cost tokens?
- When a server connects, its full tool list loads into the model context: names, descriptions, and JSON-Schema parameters for every tool. That list is sent to the model before any work happens, and the more servers and tools you connect, the larger this fixed overhead.
- Why does the cost apply on every call, not just once?
- A language model keeps no state between calls. It re-reads its entire context each time it runs. Tool definitions live in that context, so they are re-sent on every request for the length of a task, and an agent task is usually many calls.
- Why aren’t Linear, Jira, or GitHub in the table?
- Those run as remote, OAuth-only servers, and GitHub’s official server ships as a Docker image. Measuring them accurately needs a real authenticated connection, so rather than publish a guess they are left out of this pass. Every server here is one that runs locally as open source, where the tool list can be captured exactly and reproduced.
- Which MCP servers are the heaviest?
- In this set the Notion server (about 15,500 tokens across 22 tools) and the Sentry server (about 13,800 across 22) are the heaviest, more than ten times a small reference server like Git (about 1,100). Verbose, deeply nested parameter schemas drive the weight more than the raw tool count does.
- Does prompt caching make it free?
- No. Prompt caching bills the re-sent, unchanged prefix at a fraction of the normal rate, which lowers the dollar cost of re-reading the definitions. It does not reduce how much of the context window they take up, it does not help on the first call or after the cache expires, and the tokens still count toward rate limits. Caching lowers the cost. It does not remove the overhead.
- How were these numbers measured?
- Each server was run locally at the version listed and its live tools/list response captured, then every tool was serialized as its name, description, and JSON-Schema parameters and counted with tiktoken’s o200k_base encoder. Counts reflect each server’s default startup; some expose more tools behind flags or a connected account. Every row shows the exact package and version, so the measurement is reproducible. Token counts vary by roughly 10 to 20 percent across different models’ tokenizers.
- What reduces the token overhead?
- Connect fewer servers, filter to the tools a task actually needs, and load definitions lazily or behind a proxy. Prompt caching lowers the cost of re-reading the definitions but not their footprint in the window. Code Mode takes a different route: the model writes one program against a compact typed interface, so definitions load once and intermediate results stay out of the context. On multi-tool, large-result work, Anthropic reported a single task falling from about 150,000 tokens to roughly 2,000.