April 2026

MCP vs REST APIs for AI media generation

One task, two surfaces. The same generation request takes about thirty lines over REST and four lines over MCP. Here is the honest tradeoff, with numbers.

The same job, written twice

Generate one image. Get the URL. That is the whole task. AgentFramer exposes both a REST API and a hosted MCP server that share credits, storage, and workspaces, so the comparison is apples to apples.

Over REST, the client owns the state machine. POST creates a job, then the client polls until the status flips to succeeded, then it reads the URL. Around thirty lines of TypeScript once you handle errors, backoff, and the case where the job fails halfway through.

// REST: POST -> poll -> read URL
const API = "https://api.agentframer.com/v1";
const KEY = process.env.AGENTFRAMER_API_KEY!;

async function generate(prompt: string): Promise<string> {
  const create = await fetch(`${API}/generations`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ type: "image", prompt, model: "flux-1.1-pro" }),
  });
  if (!create.ok) throw new Error(`create ${create.status}`);
  const { id } = (await create.json()) as { id: string };

  for (let i = 0; i < 60; i++) {
    await new Promise((r) => setTimeout(r, 1000 + i * 250));
    const poll = await fetch(`${API}/generations/${id}`, {
      headers: { Authorization: `Bearer ${KEY}` },
    });
    if (!poll.ok) throw new Error(`poll ${poll.status}`);
    const job = (await poll.json()) as {
      status: "queued" | "running" | "succeeded" | "failed";
      url?: string;
      error?: string;
    };
    if (job.status === "succeeded" && job.url) return job.url;
    if (job.status === "failed") throw new Error(job.error ?? "failed");
  }
  throw new Error("timeout");
}

Over MCP, the agent calls a tool. The server holds the job, streams progress notifications back, and resolves with a structured result that already contains the URL. No polling loop, no job IDs to track, no retry math.

// MCP: one tool call, URL in the result
const result = await client.callTool({
  name: "generate_image",
  arguments: { prompt, model: "flux-1.1-pro" },
});
const url = result.structuredContent?.url as string;

Auth: long-lived key vs short-lived token

REST authenticates with an API key the caller stores in an env var. The key has full account scope, no expiry, and lives wherever the caller runs. That is fine for a backend behind your firewall. It is not fine inside an editor or chat UI where the model can see the environment.

MCP delegates auth to the host. AgentFramer's MCP server speaks OAuth 2.1 with PKCE: the host opens a browser, the user grants scopes (workspace, generation, billing), and the host receives an access token that expires in 60 minutes plus a refresh token. The agent process never holds a long-lived secret. Tokens stay in the host keychain, not in the model's context.

Concretely, that flow is four steps the user sees once: click connect, sign in, approve scopes, return to the editor. After that, refresh happens silently. See the security model for the scope list.

The blast radius is different too. A leaked API key from a CI log grants whatever scopes you provisioned it with until you rotate. A leaked MCP access token expires on its own in under an hour and was never written to disk in the first place. For a tool that spends credits, that gap is the difference between a panicked rotation and a boring incident report.

Discovery: 200 KB of OpenAPI vs streamed tool list

A REST-only setup forces the model to learn endpoints from documentation. The simplest path is to drop the OpenAPI spec into the system prompt. AgentFramer's spec is around 200 KB of JSON, roughly 45 thousand tokens, before you have asked it to generate anything. Truncating helps but means the model now has a partial map of the API.

MCP solves this at the protocol level. On connect, the server answers tools/list with twelve tools and their JSON Schemas: generate_image, generate_video, generate_audio, get_generation, list_recent_generations, get_credits, list_workspaces, switch_workspace, top_up_balance, list_members, invite_team_member, and submit_feedback. The host registers them with the model directly. The schemas count as tool definitions, not free-text context, and the host can hide tools the user did not authorize. See the full tool reference for argument shapes.

Latency: in the noise

Both surfaces are HTTP. MCP rides on stdio, SSE, or streamable HTTP; all three terminate at the same generation pipeline. The wire-level overhead is one extra envelope per call, sub-millisecond. The model itself can spend a second or two reasoning before issuing a tool call, which dwarfs anything either protocol adds.

Where the experience diverges is perceived latency. REST polling at one-second intervals returns the URL on a one-second granularity. MCP progress notifications arrive when the server sends them, typically every few hundred milliseconds while the image renders. Same total time, smoother feedback.

The hidden cost on the REST side is the polling itself. A 12-second image render at one-second intervals burns 12 round trips that mostly return running. None of those count against rate limits in any meaningful way, but they do count against your egress and your patience when you scale to thousands of concurrent jobs. MCP folds all of that into one open stream.

Resources: URIs the model fetches on demand

Generated images, videos, and audio are megabytes. You do not want them in the model's context. MCP has a first-class resources concept: the server returns a URI like agentframer://generations/abc/image.png, the host caches it, and the model only fetches the bytes when it needs them (for example, to reason about the result or hand it to another tool). REST can mimic this by returning a signed URL, but the discipline is on the client; with MCP it is on the protocol.

The same mechanism makes list_recent_generations cheap. Calling it returns metadata and resource URIs, not pixel data. The model can browse a workspace's history, pick the right asset by prompt or timestamp, and only pull bytes when it has decided to act on one. A REST equivalent forces the client to either over-fetch or reimplement that two-stage pattern by hand.

When MCP is overkill

MCP exists because models call tools differently than humans do. If no model is in the loop, you do not need it. Picking MCP for a non-agentic workload buys you a stateful session, an OAuth dance, and a transport abstraction in exchange for nothing.

CI jobs. A GitHub Action that renders preview thumbnails on every PR has a known input, a known output, and no agent. POST and poll. Done.
Batch backends. Generating 50,000 product images from a CSV is a queue worker problem. REST scales horizontally with idempotency keys. MCP adds a stateful session you do not need.
Webhooks and triggers. When a row lands in your database and you generate an asset for it, the caller is your application code. Use REST.
Anything with a fixed prompt template. If the prompt is built deterministically from data, the schema discovery and tool selection that MCP buys you are wasted. A typed REST client is simpler to reason about and easier to monitor.

Mixed-mode is the real answer

Most teams running AgentFramer in production use both. Backend services hit REST for cron, batch, and webhook work. Designers and engineers in Claude Code, Cursor, or Claude Desktop hit MCP for ad hoc generation, iteration, and review. They share the same workspace, the same credit pool, and the same storage bucket; only the wire format differs.

Concretely: your nightly job that re-renders product hero images runs over REST with a service-account key scoped to one workspace. Your marketing team's editor uses MCP with OAuth, picks up the same credits, and can hand finished assets back to the same storage URLs the cron job reads from. No duplication.

The transport you pick on the MCP side depends on where the host runs. Local hosts like Claude Desktop and Cursor speak stdio to a process they spawn. Remote hosts and web UIs use SSE or streamable HTTP against the hosted server. AgentFramer ships all three; the tool surface is the same on each. The full catalog of models you can call through either protocol lives on the models page.

How to pick, in one paragraph

If the caller is a model running inside a host the user trusts, use MCP. If the caller is your own code running on your own infrastructure, use REST. If both, run both: AgentFramer is built for that. Start with the quick start, wire whichever surface fits the call site, and keep the other one in your back pocket for when the requirement flips.