April 2026

Async image generation: why polling beats blocking

Image jobs take 8 to 60 seconds. Agent tool budgets are measured in calls, not minutes. If your generation tool blocks, the agent dies waiting. Polling, progress notifications, and webhooks are how you keep the loop alive.

The tool-call budget problem

Every host caps how much an agent can do in one turn. Claude Code stops a turn after roughly 25 tool calls. Cursor sits in the same range. The Anthropic API tool_use loop defaults to 16 iterations before it bails. A blocking image call burns one of those slots and sits there for 30 seconds while the agent stares at the wall.

Multiply that across a product page generating six hero variants and you have already eaten the entire budget on waiting. The agent never gets to write the alt text, upload to the CDN, or call the translation tool. Worse, half the hosts will time the tool out before the model returns and you lose the credits anyway.

AgentFramer's generate_image returns immediately with a job id and an estimated_seconds hint. The agent moves on, polls when ready, and finishes the turn with budget to spare. See the MCP tool reference for every field on the response.

Per-host numbers worth pinning in your head: Claude Code caps a turn near 25 tool calls. Cursor's agent mode behaves similarly, with a soft ceiling around 25 and a hard one when context pressure mounts. Claude Desktop forwards everything to the model and inherits whatever tool_use loop the model is configured with — 16 by default on the Anthropic API. Windsurf is more generous on call count but stricter on per-tool wall time. None of those budgets survive a 30-second blocking call repeated five times.

The job-id loop

The pattern that survives every host and every model:

Call generate_image. Receive { id, status: "queued", estimated_seconds }.
Do other work, or wait roughly estimated_seconds before the first poll.
Call get_generation(id). Statuses are queued, running, succeeded, failed, timeout, cancelled.
On succeeded, the response carries the URL, the model, and credits charged. Pass the URL to the next tool.

Polling cadence with backoff

Polling every second is wasteful and gets you rate-limited. Polling every 30 seconds is sluggish. The right curve starts at 2 seconds, multiplies by 1.5, and caps at 8 seconds. Thirty attempts is more than enough headroom for a 60-second model. Wrap it in a single helper and use it for every async tool.

type GenerationStatus =
  | "queued"
  | "running"
  | "succeeded"
  | "failed"
  | "timeout"
  | "cancelled";

interface Generation {
  id: string;
  status: GenerationStatus;
  url?: string;
  reason?: string;
  model: string;
  credits_charged?: number;
}

export async function pollGeneration(
  id: string,
  getGeneration: (id: string) => Promise<Generation>,
): Promise<Generation> {
  const START_MS = 2_000;
  const FACTOR = 1.5;
  const CAP_MS = 8_000;
  const MAX_ATTEMPTS = 30;

  let delay = START_MS;

  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
    await new Promise((r) => setTimeout(r, delay));
    const job = await getGeneration(id);

    if (job.status === "succeeded") return job;
    if (job.status === "failed") throw new Error(`failed: ${job.reason}`);
    if (job.status === "cancelled") throw new Error("cancelled");
    if (job.status === "timeout") throw new Error("timeout");

    delay = Math.min(delay * FACTOR, CAP_MS);
  }

  throw new Error("polling exhausted");
}

Total wait if the job runs the full 30 attempts: about 220 seconds. You will almost never hit that. The first poll lands at 2 seconds, the second at 5, the third at 8, and from there every 8 seconds until succeeded. A typical SDXL job resolves in two or three polls. Flux-1.1-pro lands in three to five. A 4K upscale or a video preview can stretch to ten or twelve.

Two tweaks worth knowing. Add jitter — multiply the delay by 0.85 + Math.random() * 0.3 — when you fan out more than a few jobs, so every poll does not arrive on the same tick. And honor the server's estimated_seconds as the first wait instead of the hard 2-second floor when it is greater than 2; you save one round trip on slow models.

MCP progress notifications

MCP has a server-to-client message for long-running tools. The server emits notifications/progress while the tool is still executing. Hosts that support it (Claude Desktop, Claude Code, and Cursor in recent builds) render a live progress indicator instead of killing the call. Each notification carries the same job id and a progress fraction:

{
  "jsonrpc": "2.0",
  "method": "notifications/progress",
  "params": {
    "progressToken": "gen_8f3a",
    "progress": 0.65,
    "total": 1,
    "message": "denoising step 32/50"
  }
}

Progress notifications are a UX layer, not a replacement for the job id. The agent should still call get_generation to read the final URL. Treat notifications as decoration; treat the polled status as truth. If the host drops the connection mid-stream you still have the job id, you can still poll, and the work is not lost.

When you implement an MCP server yourself, emit progress at meaningful checkpoints — denoising step boundaries, upscale start, safety check complete — not on every internal tick. Hosts rate-limit notification rendering and a 200-event firehose just gets dropped. Two to four notifications across a 30-second job is the sweet spot. The MCP integration guide has the wire-format details.

Webhooks for batch jobs

Once you fire more than five concurrent jobs, polling stops paying off. Subscribe a webhook, hand the user back to their next task, and resume when the batch finishes. The payload is intentionally small so a queue worker can fan it out without parsing the model output:

{
  "event": "generation.succeeded",
  "id": "gen_8f3a2c19",
  "url": "https://cdn.agentframer.com/g/8f3a2c19.png",
  "model": "flux-1.1-pro",
  "credits_charged": 4,
  "created_at": "2026-04-30T10:14:22Z"
}

Failure events use the same shape with generation.failed, generation.timeout, or generation.cancelled and a reason field. Webhooks retry on non-2xx with exponential backoff over 24 hours, so your endpoint must be idempotent on id. The full delivery contract, signature header, and retry policy are in the tool-call patterns guide.

A common pattern: the agent fires twenty image jobs, returns control to the user with a placeholder UI, and a serverless webhook handler updates rows in your database as each job lands. The user sees images stream in over fifteen seconds instead of staring at a spinner for three minutes. The agent's tool budget for that turn was four calls, not twenty.

Idempotency keys

Agents retry. Networks blip. Hosts re-issue tool calls when the model thinks it lost a response. Without idempotency keys, every retry pays for the same image again. Make the key deterministic per logical asset, not per attempt. For a product catalog: sku-A4392/variant-blue is a good key. generate-1714476543112 is a bad one — every retry mints a new key and a new charge.

AgentFramer deduplicates on the key for 24 hours. A duplicate call returns the original job id, the original URL once it succeeds, and credits are not charged twice. For multi-tenant products, namespace the key with the workspace id or user id so two tenants requesting the same SKU do not collide. Combine with scoped API keys per workspace and your blast radius for a leaked key stays contained.

Three failure modes, three different responses

Async means failure shows up as status, not exceptions. The wrong reflex on any of the three is what turns a glitch into a billable incident. Each status demands a different response:

failed: read reason. NSFW filter or policy hit means soften the prompt and retry. A model error means switch models, not prompts. Do not retry blindly.
timeout: the model never returned. Retry once. If the second attempt also times out, fall back to a faster model from the models catalog.
cancelled: the workspace ran out of credits or an admin killed the job. Surface to the user. Never retry. Auto-retry on cancelled is how agents drain a balance overnight.

Bake those three branches into the same handler that wraps the polling helper. The agent should see one return value (an image URL and a model name) or one structured error it can reason about. If you let raw status codes leak into the model's context, it will improvise, and improvisation on a billing-related failure is exactly what you do not want. Pricing details that drive the cost-per-retry tradeoff are on the pricing page.

When sync mode is the right call

Async is the default for a reason, but blocking has a place. Pass wait: true on generate_image when:

The model resolves in under 3 seconds. Polling overhead alone doubles the wall time on a fast model.
The script is single-shot. A cron job that generates one OG image and exits has nothing else to do during the wait.
The host does not forward MCP progress notifications and has a short tool timeout. Some embedded MCP clients still cut at 10 seconds. Sync with a fast model is safer than async with a host that hangs up mid-poll.
You are running inside a workflow engine (Inngest, Trigger.dev, Vercel Workflow) where the step itself already has retries and durable timers. The engine is your polling loop.

Outside those four cases, default to async. The polling helper above is forty lines and pays for itself the first time a model takes longer than expected.

Ship the loop once

Write the polling helper once. Use it for images, video, audio, anything else that returns a job id. Add idempotency keys at the call site. Subscribe a webhook for batches over five. Handle the three failure statuses with three different responses. That is the entire playbook.

Agents that wait politely outperform agents that block. Start in async mode and your tool budget stops being the bottleneck. Spin up a workspace and the first generation is free.