Skip to main content

April 2026

What MCP gives an agent that REST cannot

Ask Claude Code to produce a 6-second product teaser with a hero shot, ambient music, and a voiceover. With a REST stack you write three polling clients, a credit ledger, and a retry policy. With MCP you write nothing. The agent calls three tools and hands you the URLs.

The shape of the problem

Model Context Protocol (MCP) is an open standard from Anthropic for connecting language models to external tools. The model sees a typed tool surface: names, JSON Schemas, and a way to invoke. It does not see HTTP verbs, bearer tokens, exponential backoff, or webhook signatures. Those concerns live in the MCP server, which is the adapter between an agent and the rest of the world.

Before MCP, every team that wanted Claude or Cursor to generate an image hand-rolled the same scaffolding. An HTTP client. An API key stashed somewhere. A polling loop. A storage strategy for the resulting URL. Then they wrote a system prompt convincing the model to call their wrapper correctly. With MCP, the wrapper moves out of the prompt and into a tool the agent already knows how to call.

Why media generation is the use case MCP was built for

Three properties make media generation uniquely well-suited to a typed tool protocol — and uniquely painful to glue together with REST:

  • It is asynchronous. FLUX 1.1 pro returns a 1024x1024 image in 3-6 seconds. Sora 2 takes 90-180 seconds for a 10-second clip. Veo 3 sits at 60-120 seconds. Suno and Stable Audio 2 land between 8 and 30 seconds for a 30-second track. Agents cannot block on a synchronous request and stay responsive across that range.
  • It is multi-modal. A single agent run might generate a hero image with Ideogram v2, a 6-second product loop with Kling 2 i2v, and a 12-second voiceover with ElevenLabs. Each model has its own knobs: aspect ratio, seed, guidance scale, voice id, motion strength.
  • It is expensive. A FLUX schnell image is fractions of a cent. A Veo 3 clip is several dollars. A loop that retries blindly burns real money in minutes. Tools that expose credit balances and per-call cost let the agent throttle itself.

Asynchronous, multi-modal, expensive. That triad is exactly what a typed protocol with first-class job ids and side-channel calls like get_credits is good at. It is exactly what a flat REST surface is bad at.

A generate_image walkthrough, end to end

Here is what a single image generation looks like over MCP. The agent calls the generate_image tool with a JSON payload that matches the schema the server published at handshake:

// Agent -> MCP server: tools/call
{
  "name": "generate_image",
  "arguments": {
    "model": "flux-1.1-pro",
    "prompt": "a brushed-aluminum espresso machine on a marble counter, soft morning light",
    "aspect_ratio": "3:2",
    "seed": 42,
    "guidance_scale": 3.5,
    "num_outputs": 1
  }
}

The server validates the payload, deducts credits, dispatches to the provider, and returns a job id immediately — usually under 200ms. The agent does not wait on the wire for the model.

// MCP server -> agent: tools/call response
{
  "id": "gen_8b2c1f3a",
  "status": "queued",
  "model": "flux-1.1-pro",
  "estimated_seconds": 5,
  "credits_charged": 4
}

// Agent -> MCP server: tools/call (a few seconds later)
{
  "name": "get_generation",
  "arguments": { "id": "gen_8b2c1f3a" }
}

// MCP server -> agent
{
  "id": "gen_8b2c1f3a",
  "status": "succeeded",
  "url": "https://cdn.agentframer.com/g/gen_8b2c1f3a.png",
  "width": 1536,
  "height": 1024,
  "duration_ms": 4820
}

Two tool calls. No polling cadence to tune, no auth header to set, no timeout to argue about. The agent reads the URL, drops it into the document it is editing, and moves on. The full list of tool shapes — generate_video, generate_audio, list_recent_generations, get_credits, top_up_balance — is documented in the MCP tools reference.

Before and after: 40 lines of REST vs one tool call

To make the difference concrete, here is the kind of REST polling client a developer ends up writing if their image provider does not expose an MCP server:

// 40-line REST polling client (paraphrased)
const res = await fetch("https://api.provider.com/v1/images", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.PROVIDER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "flux-1.1-pro",
    prompt,
    aspect_ratio: "3:2",
    seed: 42,
  }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const { id } = await res.json();

// poll
let url: string | null = null;
for (let i = 0; i < 30; i++) {
  await new Promise((r) => setTimeout(r, 2000));
  const poll = await fetch(`https://api.provider.com/v1/images/${id}`, {
    headers: { Authorization: `Bearer ${process.env.PROVIDER_API_KEY}` },
  });
  const body = await poll.json();
  if (body.status === "succeeded") { url = body.url; break; }
  if (body.status === "failed") throw new Error(body.error);
}
if (!url) throw new Error("timed out");
// ...and now repeat for video, audio, retries, credit checks, model selection.

From the agent's side, with MCP, the same operation is one line:

// Inside Claude Code, Cursor, or Claude Desktop
generate_image({ model: "flux-1.1-pro", prompt, aspect_ratio: "3:2", seed: 42 })

The 40 lines do not disappear. They move into the MCP server, where they are written once, instrumented, and shared by every agent that connects. That is the trade MCP makes — centralize the glue, free the caller. The savings compound when you add video and audio. Each new modality would mean another bespoke client, another auth header, and another retry policy under REST. Under MCP it is one more tool entry in the same server, and the agent is already fluent in calling it.

There is a second-order effect that matters more than the line count. When the agent owns the dispatch logic, it can compose calls it was never explicitly told to compose. Ask for a product teaser and watch the agent call generate_image for a hero, generate_video with that image as a reference frame, then generate_audio for a voiceover — checking get_credits between calls if the workspace is running low. None of that orchestration was wired by hand.

Transports: stdio, SSE, streamable HTTP

MCP defines three transports. stdio runs the server as a local subprocess and pipes JSON-RPC over stdin and stdout — used by Claude Desktop and Cursor for local servers. SSE (Server-Sent Events) was the original remote transport: a long-lived GET for server messages and a separate POST for client messages. Streamable HTTP is the current remote transport, a single endpoint that handles both directions and supports resumable sessions across reconnects.

AgentFramer runs as a hosted, remote MCP server over streamable HTTP. You add one URL to Claude Code, Cursor, Windsurf, Codex, or Claude Desktop and the agent has image, video, and audio tools the next time it starts. There is no local process to keep alive, no Docker container, and no port to forward. When the agent restarts mid-job, the streamable HTTP session resumes against the same job ids, so a two-minute Veo 3 render does not get orphaned by an editor reload.

stdio is still useful if you are writing your own MCP server for internal tools — it is the lowest-friction way to get a Python or Node script in front of an agent without thinking about networking. For a third-party media platform, though, stdio means shipping a subprocess, an updater, and per-OS install instructions. Streamable HTTP collapses that into a URL.

OAuth: agents never see your API keys

The reason a hosted MCP server can be safe is OAuth. When the host (Claude Code, Cursor, Claude Desktop) connects to AgentFramer, it opens a browser, the user signs in, the workspace authorizes the host, and the host receives a scoped token bound to that workspace. The agent never holds the API key. It cannot exfiltrate one because it never sees one.

This is a meaningful upgrade over the "paste your API key into a config file" pattern. Tokens are scoped per workspace, can be revoked from the dashboard, and rotate without code changes. If a laptop with Claude Code installed gets stolen, you revoke that session in the AgentFramer dashboard and the workspace is sealed — you do not have to rotate a master key across every other machine that uses it. Read more in the security overview.

The OAuth handshake is also where workspace selection happens. The same user can be a member of three workspaces — personal, agency, client — and the host lets the user pick which one this session is billed against. The agent then sees list_workspaces and switch_workspace as tools and can move between them on instruction, without ever holding raw credentials for any of them.

When MCP is the wrong abstraction

MCP is a protocol shaped around a language model deciding when and how to call a tool. That shape is wrong for a few real cases:

  • Backend cron jobs. A nightly script that renders 500 product thumbnails does not need an LLM in the loop. Use the REST API directly. You want deterministic batching, not a model deciding parameters per item.
  • CI pipelines. A GitHub Action that generates an OG image on every release should be a plain HTTP POST. MCP adds a handshake, a session, and a JSON-RPC layer that buys you nothing in a non-interactive environment.
  • Non-LLM callers. If the caller is a frontend, a Lambda, or a Rust service, you do not have a model to dispatch tool calls. MCP is built for language-model clients. Skip it.
  • Tight latency budgets. MCP adds a tools/list round trip on connect and a JSON-RPC frame per call. For sub-100ms inner loops, talk to the provider directly.

The rule of thumb: if there is a language model on one side of the call making decisions about what to invoke, MCP is the right shape. If the caller is a script, REST is fine. AgentFramer ships both — the same workspace exposes an MCP endpoint for agents and a REST endpoint for the cron job that fires once a night. They share credits, storage, and the same generation history.

What to do next

If your agent already lives in Claude Code, Cursor, Windsurf, Codex, or Claude Desktop, you can have generate_image, generate_video, and generate_audio working in about two minutes. Spin up a workspace, add the MCP URL, and your next prompt that mentions an image will actually produce one. The full model list — FLUX, SDXL, SD3.5, Ideogram, Imagen, Sora 2, Veo 3, Runway Gen-4, Kling, Luma Ray 2, Suno, ElevenLabs, Cartesia — is on the models page. Pricing is per-call, no monthly minimum, and you can top up from inside the agent.

The version of an AI agent that has to stop, open a browser, and paste a prompt into a generation UI is the 2024 version. The version that calls a tool and hands you the URL is the one your competitors are building this quarter.