Question 1

What is an MCP server?

Accepted Answer

An MCP server is a program that exposes a set of tools to an AI model over the Model Context Protocol, the open standard for connecting models to outside systems. Each tool carries a name, a description, and a JSON-Schema definition of its inputs. Those definitions are what load into the model context and cost tokens, which is what this page measures.

Question 2

Why do MCP servers cost tokens?

Accepted Answer

When a server connects, its full tool list loads into the model context: names, descriptions, and JSON-Schema parameters for every tool. That list is sent to the model before any work happens, and the more servers and tools you connect, the larger this fixed overhead.

Question 3

Why does the cost apply on every call, not just once?

Accepted Answer

A language model keeps no state between calls. It re-reads its entire context each time it runs. Tool definitions live in that context, so they are re-sent on every request for the length of a task, and an agent task is usually many calls.

Question 4

Why aren’t Linear, Jira, or GitHub in the table?

Accepted Answer

Those run as remote, OAuth-only servers, and GitHub’s official server ships as a Docker image. Measuring them accurately needs a real authenticated connection, so rather than publish a guess they are left out of this pass. Every server here is one that runs locally as open source, where the tool list can be captured exactly and reproduced.

Question 5

Which MCP servers are the heaviest?

Accepted Answer

In this set the Notion server (about 15,500 tokens across 22 tools) and the Sentry server (about 13,800 across 22) are the heaviest, more than ten times a small reference server like Git (about 1,100). Verbose, deeply nested parameter schemas drive the weight more than the raw tool count does.

Question 6

Does prompt caching make it free?

Accepted Answer

No. Prompt caching bills the re-sent, unchanged prefix at a fraction of the normal rate, which lowers the dollar cost of re-reading the definitions. It does not reduce how much of the context window they take up, it does not help on the first call or after the cache expires, and the tokens still count toward rate limits. Caching lowers the cost. It does not remove the overhead.

Question 7

How were these numbers measured?

Accepted Answer

Each server was run locally at the version listed and its live tools/list response captured, then every tool was serialized as its name, description, and JSON-Schema parameters and counted with tiktoken’s o200k_base encoder. Counts reflect each server’s default startup; some expose more tools behind flags or a connected account. Every row shows the exact package and version, so the measurement is reproducible. Token counts vary by roughly 10 to 20 percent across different models’ tokenizers.

Question 8

What reduces the token overhead?

Accepted Answer

Connect fewer servers, filter to the tools a task actually needs, and load definitions lazily or behind a proxy. Prompt caching lowers the cost of re-reading the definitions but not their footprint in the window. Code Mode takes a different route: the model writes one program against a compact typed interface, so definitions load once and intermediate results stay out of the context. On multi-tool, large-result work, Anthropic reported a single task falling from about 150,000 tokens to roughly 2,000.

MCP server	Vendor	Tools	Tokens
Notion @notionhq/notion-mcp-server@2.2.1	Notion	22	15,462
Sentry @sentry/mcp-server@0.36.0	Sentry	22	13,823
Supabase @supabase/mcp-server-supabase@0.8.2	Supabase	29	3,422
Playwright @playwright/mcp@0.0.75	Microsoft	23	3,130
Filesystem @modelcontextprotocol/server-filesystem@2026.1.14	MCP reference	14	1,641
AWS API awslabs.aws-api-mcp-server@1.3.43	AWS Labs	2	1,554
Git mcp-server-git@2026.6.4	MCP reference	12	1,119

How many tokens do your MCP servers burn before any work starts?

Which model are you running?

Which MCP servers does it connect to?

Loaded into context, before any work

Measured token weight by MCP server

How these numbers are measured

Why tool definitions cost tokens

What reduces it

Common questions