BEST PRACTICES • 12 MIN READ

MCP Performance Optimization: Reduce Latency & Token Usage

Caching, streaming, tool selection, and context pruning techniques that cut MCP latency by up to 70%. Real benchmark numbers and production-ready code.

Updated recently

TL;DR

  • Cache tool results aggressively — repeated reads are the biggest latency killer
  • Use streaming responses (SSE transport) for tools that return large payloads
  • Keep tool schemas lean: verbose descriptions cost tokens every request
  • Batch multiple reads into one tool call instead of chaining N sequential calls
  • TypeScript MCP servers cold-start ~80ms faster than Python equivalents
  • Prune context between turns — only pass what the model needs for the current step

Why MCP Performance Matters

Every MCP tool call adds latency to the user experience. In a typical Claude Desktop session, the model may invoke 5–15 tools per task. If each call takes 300ms, that is 1.5–4.5 seconds of waiting before the model can reason about the results. Multiply that across thousands of users and the cost compounds fast — both in wall-clock time and in API token spend.

The good news: most performance problems in MCP integrations come from a handful of preventable patterns. This guide walks through each one with benchmarks and fixes.

Benchmark Baseline

All numbers below were measured on a MacBook Pro M3 (local stdio transport) and an AWS t3.medium (SSE transport, us-east-1) running Claude claude-sonnet-4-6 via the Anthropic API.

ScenarioBeforeAfterImprovement
File read (no cache)320ms8ms97% faster
DB query (no cache)480ms12ms97% faster
5 sequential reads1600ms490ms69% faster
Large payload (no stream)2200ms820ms TTFB63% faster TTFB
Python cold start310msbaseline
TypeScript cold start230ms26% faster

1. Caching Tool Results

The single biggest performance win in most MCP deployments is caching. When an AI session reads the same file, queries the same database row, or fetches the same API endpoint multiple times within a conversation, you are paying the full round-trip cost every time. A simple in-memory cache with TTL cuts that to near zero.

TypeScript — LRU cache for MCP tools

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { LRUCache } from "lru-cache";

const server = new McpServer({ name: "cached-server", version: "1.0.0" });

// TTL cache: 5 minutes, max 500 entries
const cache = new LRUCache<string, string>({
  max: 500,
  ttl: 1000 * 60 * 5,
});

server.tool(
  "read_document",
  { id: { type: "string", description: "Document ID" } },
  async ({ id }) => {
    const cacheKey = `doc:${id}`;
    const cached = cache.get(cacheKey);
    if (cached) {
      return { content: [{ type: "text", text: cached }] };
    }

    // Expensive DB fetch only on cache miss
    const doc = await db.documents.findById(id);
    const text = JSON.stringify(doc);
    cache.set(cacheKey, text);

    return { content: [{ type: "text", text }] };
  }
);

For mutable data, key your cache entries with a content hash or entity version number rather than a fixed TTL. This way, a write to the database immediately invalidates the relevant cache entry.

2. Streaming Responses with SSE Transport

The default stdio transport buffers the entire tool response before returning it to the model. For tools that return large payloads — log files, long documents, search results — this means the model sits idle until the full response is assembled. Switch to SSE (Server-Sent Events) transport and stream the content progressively.

TypeScript — SSE transport setup

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import express from "express";

const app = express();
const server = new McpServer({ name: "streaming-server", version: "1.0.0" });

app.get("/sse", async (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  const transport = new SSEServerTransport("/messages", res);
  await server.connect(transport);
});

app.post("/messages", express.json(), async (req, res) => {
  // SSE transport handles routing internally
  await transport.handlePostMessage(req, res);
});

app.listen(3000);

3. Tool Schema Optimization

Every MCP session sends the full tools/list payload to the model before each turn. Verbose descriptions, redundant parameter explanations, and deeply nested input schemas all inflate your token count — and therefore your inference cost — on every single request.

PatternTokens / ToolRecommendation
Verbose description (200+ words)~300Trim to 1–2 sentences
Nested object params (3+ levels)~180Flatten to scalar params
Enum with 20+ values~120Use string + validate server-side
Concise description (1–2 sentences)~40Target this range

4. Batching Sequential Calls

A common anti-pattern is exposing fine-grained tools that force the model to make N sequential calls to accomplish what could be one batched call. If you have a filesystem server, for instance, add a read_multiple_files tool alongside read_file. The model will use it.

TypeScript — batch read tool

server.tool(
  "read_multiple_files",
  {
    paths: {
      type: "array",
      items: { type: "string" },
      description: "File paths to read (max 20)",
      maxItems: 20,
    },
  },
  async ({ paths }) => {
    // Read all files in parallel — not sequential
    const results = await Promise.all(
      paths.map(async (p) => {
        try {
          const content = await fs.readFile(p, "utf8");
          return `=== ${p} ===\n${content}`;
        } catch (err) {
          return `=== ${p} === ERROR: ${(err as Error).message}`;
        }
      })
    );

    return {
      content: [{ type: "text", text: results.join("\n\n") }],
    };
  }
);

5. Context Pruning

MCP tool results accumulate in the conversation context. After a tool returns a 200-line JSON blob, that entire blob is re-sent to the model on every subsequent turn. Design your tools to return only what the model needs for the next reasoning step, not the full raw API response.

  • Return summaries when possible: instead of a 500-line log file, return the last 20 error lines
  • Filter API responses server-side before returning to the model
  • Use pagination: expose a page param and return 10–20 items at a time
  • Strip metadata fields the model does not need (internal IDs, audit timestamps, etc.)

6. Python vs TypeScript Performance

Both the official Python SDK (mcp) and TypeScript SDK (@modelcontextprotocol/sdk) are production-ready. The performance differences are real but smaller than most engineers expect:

  • Cold start: TypeScript (Node.js) starts ~80ms faster than Python. For stdio servers that Claude Desktop restarts per session, this adds up.
  • Throughput: For I/O-bound tools (HTTP calls, DB queries), the difference is negligible — both spend most time waiting on the network.
  • Memory: Python uses ~15MB more RSS at idle due to the interpreter overhead.
  • CPU-bound tools: TypeScript has the edge for pure computation; Python wins if you need NumPy, Pandas, or ML libraries.

The practical recommendation: choose the language your team knows best. The performance gap rarely justifies a rewrite.

7. Connection Keep-Alive for SSE

When using SSE transport, avoid tearing down and re-establishing the connection between turns. Keep-alive connections eliminate TCP handshake overhead (~50–120ms per request depending on geography). Set a heartbeat ping to prevent proxies and load balancers from closing idle connections.

SSE heartbeat (Node.js)

// Send a comment-line ping every 30s to keep the connection alive
setInterval(() => {
  res.write(": ping\n\n");
}, 30_000);

Quick-Win Checklist

PERFORMANCE CHECKLIST

Add TTL cache to all read-only tools
Use Promise.all() for parallel I/O inside a single tool call
Trim tool descriptions to 1–2 sentences
Expose batch variants of frequently chained tools
Filter API responses before returning to the model
Switch to SSE transport for servers handling large payloads
Profile with MCP Inspector before and after changes

Next Steps

Have Questions?

Join the MCP community on GitHub or Discord for help and discussion.