Why shouldn't I count a 429 as a failed retry attempt?

Because a 429 didn't fail — it was deferred. If you give a job a budget of, say, 5 attempts and a sustained rate limit eats all 5 in backpressure responses, the job dies in the dead-letter queue having never actually run, while a transient 500 later in the job's life has no attempts left to absorb it. Counting backpressure against the retry budget conflates two different things and drops work that would have completed fine if you'd just waited.

What is the Retry-After header and should I always honor it?

Retry-After is a response header on 429 and 503 (and sometimes 529) responses telling you the minimum time to wait before retrying — either as a number of seconds or an HTTP date. Always honor it. If you retry sooner, you stay throttled longer and the downstream window takes more wall-clock time to recover. OpenAI sends retry-after / retry-after-ms; Anthropic sends retry-after in seconds.

What does a 529 mean and how do I handle one without a Retry-After header?

529 is an unofficial "site is overloaded" status used by some providers (Anthropic's API returns it as overloaded_error) to mean the service is temporarily saturated. It often arrives without a Retry-After header. Treat it as backpressure, not failure, and apply a sensible fallback delay — a few seconds with jitter — then redeliver. Don't burn an attempt on it.

What's the difference between defer and retry in a job queue?

Retry means "that attempt failed, try again and count it against the budget." Defer means "hold this job, don't count this against the budget, redeliver it after a delay." Defer is the right primitive for backpressure: the work didn't fail, it was postponed. SimpleQ exposes defer as a distinct signal — a downstream 429/503/529 is deferred and redelivered automatically, and in ack mode you can call /defer with a retryAfter to signal it explicitly.

Backpressure: handling 429, 503, and 529 without losing work

TL;DR

A 429, 503, or 529 is backpressure, not failure: the downstream service is healthy and your request is valid — it just can't take more load right now. Counting backpressure as a failed retry attempt burns the retry budget and drops work that would have completed. The fix is to defer instead of retry: hold the job, honor Retry-After (with a fallback delay for 529s that omit it), redeliver, and burn no attempt — so a job can ride out a sustained rate limit and still finish.

There's a category error baked into most retry code, and it costs you completed work. When a downstream API returns a 429, the usual handler does the same thing it does for a 500: increment the attempt counter, schedule a backoff, and if attempts run out, give up. But a 429 isn't a failure. Nothing broke. The service looked at your perfectly valid request and said 'not now.' Treating 'not now' the same as 'this is broken' is the bug, and this post is about fixing it.

Failure vs. backpressure: two different signals

Every response your worker gets from a downstream service falls into one of three buckets, and they need three different responses:

Signal	What it means	Right response
Success (2xx)	The work completed.	Acknowledge, move on.
Failure (5xx on a valid request, 4xx you can fix, timeout, reset)	Something went wrong with this attempt.	Retry with backoff, burning the attempt; eventually dead-letter.
Backpressure (429, 503, 529)	The service is fine and your request is valid — it just can't take more load right now.	Defer: wait, redeliver, burn no attempt.

The middle row and the bottom row look identical in naive code — both are non-2xx, so both get the retry path. But the semantics are opposite. A 500 means this attempt is dead and trying again might help. A 429 means this attempt never happened in any meaningful sense — the request was refused at the door before any work was done. There's nothing to retry, only something to postpone.

429 Too Many Requests — you've exceeded a rate limit. OpenAI and Anthropic both use this, with a retry-after header telling you when the window clears.
503 Service Unavailable — the service is temporarily down or shedding load. Often carries Retry-After. The request is valid; come back shortly.
529 Site Overloaded — an unofficial status some providers use (Anthropic returns it as overloaded_error) to mean 'temporarily saturated.' Frequently arrives with no Retry-After at all.

The tell: was work done?

The cleanest way to classify a response is to ask whether the downstream service did any work. A 500 means it tried and something broke. A 429/503/529 means it refused before doing anything. If no work was done and the request was valid, it's backpressure — and burning a retry attempt on it is throwing away budget you'll want later.

Why counting backpressure burns your budget

Say you give every job a budget of 5 attempts — a reasonable cap for absorbing transient failures. Now a provider rate limit kicks in for two minutes during a traffic spike. Here's what happens if your queue counts each 429 as a failed attempt:

1Attempt 1: job delivered, worker calls the LLM, gets a 429. Attempt burned. (1 of 5 gone.)
2Attempt 2 (after backoff): still rate-limited, another 429. (2 of 5.)
3Attempts 3, 4, 5: the rate limit is still in effect. All burned on 429s.
4Attempt 6 doesn't exist. The job lands in the dead-letter queue having never actually run.

The job didn't fail. It was simply unlucky enough to be alive during a rate-limit window, and your retry accounting killed it. And the failure mode is exactly inverted from what you want: jobs are most likely to die precisely when the system is busiest, which is when losing work hurts most.

There's a subtler cost too. The retry budget exists to absorb real, transient failures — a flaky network, a momentary 500. If backpressure eats the budget early in a job's life, a genuine transient failure later has no attempts left to absorb it. You've spent your insurance on something that was never a claim.

Naive backoff makes it worse, not better

Layering exponential backoff on top of attempt-counting doesn't save the job — it just spaces out the funeral. The job still dies; it just takes longer. Backoff is the right tool for transient failures (see /blog/why-job-retries-matter for when retries actually help). It's the wrong tool for backpressure, which needs a fundamentally different primitive: defer, not retry.

The defer model: hold, honor, redeliver

Defer is the primitive backpressure actually needs. Instead of 'that attempt failed, try again and count it,' defer says 'hold this job, wait the requested time, redeliver it, and don't touch the attempt counter.' The three steps:

1Hold the job — take it out of active delivery so it isn't hammering a downstream that already said no.
2Honor the Retry-After the downstream sent. That's the service telling you exactly when it'll have capacity again. Respect it; don't guess shorter.
3Redeliver after the delay, with the attempt counter untouched. As far as the budget is concerned, the deferred delivery never happened.

The payoff is that a job can ride out a sustained rate limit and still complete. The two-minute rate-limit window from the previous section becomes a non-event: the job defers, waits the Retry-After, defers again if it's still throttled, and eventually lands in a delivery where the window has cleared — at which point it runs, succeeds, and acks. Zero attempts burned. Its full retry budget is still intact for any real failure that might come later.

This is how SimpleQ handles backpressure by default. SimpleQ is a managed, push-based transport: you POST a job over HTTP, SimpleQ durably stores it and POSTs it to your own webhook, and your worker runs the business logic. When your worker responds with a 429, 503, or 529 and a Retry-After, SimpleQ defers the job for that duration and redelivers it — no attempt burned. Pair it with a per-queue fixed-window rate limit (rateLimitMax / rateLimitWindow) so SimpleQ paces delivery to match the downstream's ceiling in the first place, and most backpressure never happens at all.

The worker: return 429 with Retry-After

In standard delivery mode, your webhook has a hard 15-second window to respond. To signal backpressure, you propagate the downstream's status and Retry-After straight back to SimpleQ. Here's a worker that calls an LLM and surfaces a 429 cleanly — using OpenAI's gpt-4o-mini here, though the same shape works for Anthropic's claude-sonnet-4-6, which sends retry-after on 429 and a 529 overloaded_error when saturated:

app/run-llm-job/route.ts

1import OpenAI from "openai";
2 
3const openai = new OpenAI();
4 
5export async function POST(req: Request) {
6  const job = await req.json();
7 
8  try {
9    const completion = await openai.chat.completions.create({
10      model: "gpt-4o-mini",
11      messages: job.payload.messages,
12      max_tokens: 512,
13    });
14 
15    await saveResult(job.id, completion);
16    // 2xx = success. SimpleQ acks the job.
17    return Response.json({ ok: true });
18  } catch (err: any) {
19    // Backpressure: tell SimpleQ to defer, not fail.
20    if (err.status === 429 || err.status === 503 || err.status === 529) {
21      // OpenAI sends retry-after (seconds) or retry-after-ms.
22      const retryAfter =
23        err.headers?.["retry-after"] ??
24        secondsFromMs(err.headers?.["retry-after-ms"]) ??
25        DEFAULT_BACKPRESSURE_DELAY; // fallback for 529s with no header
26 
27      return new Response(null, {
28        status: 429,
29        headers: { "Retry-After": String(retryAfter) },
30      });
31    }
32 
33    // Anything else is a real failure — let SimpleQ retry it normally.
34    return new Response("job failed", { status: 500 });
35  }
36}
37 
38const DEFAULT_BACKPRESSURE_DELAY = 5; // seconds, plus jitter applied by SimpleQ
39 
40function secondsFromMs(ms?: string) {
41  return ms ? Math.ceil(Number(ms) / 1000) : undefined;
42}

The whole pattern lives in the catch block. A 429/503/529 from the provider becomes a 429 + Retry-After from your webhook, which SimpleQ reads as 'defer, don't fail.' Everything else becomes a 500, which SimpleQ reads as a genuine failure and retries against the budget with normal backoff. Two response codes, two semantics, no conflation.

Folding in 529: a sensible fallback delay

529 is the awkward one. It's not a registered HTTP status code, but several providers use it to mean 'overloaded, try again shortly' — Anthropic returns it as an overloaded_error. The complication is that 529s frequently arrive with no Retry-After header at all, because the service is too saturated to estimate when it'll recover.

You can't honor a header that isn't there, so you supply a sensible fallback delay instead. The rules of thumb:

Pick a small fixed base — a few seconds, not minutes. The service is overloaded, not down; you want to check back soon without piling on.
Add jitter. If every deferred job wakes up at the same instant, they re-saturate the service and trigger another wave of 529s. Spreading wake-ups across a window breaks the thundering herd.
Cap the total time a job will defer. A job that has deferred for, say, an hour straight is a signal something is genuinely wrong upstream — at that point you may want it to fail loudly rather than wait forever.

In the worker above, DEFAULT_BACKPRESSURE_DELAY is that fallback: when the provider gives you a 529 with no header, you return Retry-After: 5 and SimpleQ defers the job 5 seconds (applying jitter so a batch of overloaded jobs doesn't redeliver in lockstep). The job still burns zero attempts; it's still backpressure, just backpressure where you had to estimate the wait yourself.

529 and 503 are backpressure, not just 429

It's easy to special-case 429 and forget the 5xx-shaped backpressure signals. A 503 with Retry-After and a 529 overloaded error are both 'not now,' not 'broken' — handle all three in the same backpressure branch. The only 5xx you should treat as a real failure is a 500/502/504 on a valid request, where work was attempted and something actually went wrong.

Long jobs: the ack-mode /defer callback

Standard mode's 15-second webhook timeout is fine when your worker's job is just to relay a quick downstream call. But for long-running work — a multi-step pipeline, a large completion, a batch of API calls — you'll use ack mode instead: your webhook returns 200 fast to confirm receipt, then your app reports the outcome later out of band. The Anthropic template extends the ack window to 600 seconds and the OpenAI template to 300.

In ack mode you have three explicit signals, which map exactly onto the three response buckets from the top of this post:

Callback	Meaning	Effect on retry budget
POST /v1/jobs/:id/ack	The work succeeded.	Job completes.
POST /v1/jobs/:id/nack	The work failed (set retryable to control whether it retries).	Burns an attempt if retryable; otherwise dead-letters.
POST /v1/jobs/:id/defer	Backpressure — hold and redeliver after retryAfter seconds.	No attempt burned.

So when a long-running job hits a rate limit mid-flight, you don't nack it (that's failure) — you call /defer with a retryAfter and let SimpleQ redeliver once the window clears:

lib/report-outcome.ts

1const BASE = "https://api.simpleq.io";
2const auth = { Authorization: "Bearer sq_live_..." };
3 
4async function processJob(job: { id: string; payload: any }) {
5  try {
6    const result = await runLongPipeline(job.payload);
7 
8    // Success — acknowledge so the job completes.
9    await fetch(`${BASE}/v1/jobs/${job.id}/ack`, {
10      method: "POST",
11      headers: auth,
12    });
13  } catch (err: any) {
14    if (err.status === 429 || err.status === 503 || err.status === 529) {
15      // Backpressure: defer. No attempt is burned.
16      const retryAfter = Number(err.headers?.["retry-after"]) || 5;
17      await fetch(`${BASE}/v1/jobs/${job.id}/defer`, {
18        method: "POST",
19        headers: { ...auth, "Content-Type": "application/json" },
20        body: JSON.stringify({ retryAfter }),
21      });
22    } else {
23      // Real failure: nack as retryable so the budget absorbs it.
24      await fetch(`${BASE}/v1/jobs/${job.id}/nack`, {
25        method: "POST",
26        headers: { ...auth, "Content-Type": "application/json" },
27        body: JSON.stringify({ retryable: true }),
28      });
29    }
30  }
31}

The shape is identical to the standard-mode worker — the only difference is that you report the outcome with an explicit API call instead of an HTTP status code, because the work outlived the request that delivered it. The official TypeScript SDK, @simpleq/sdk on npm, wraps these calls; the API is HTTP-first underneath, so any language works.

Best line of defense: don't generate the backpressure

Deferring backpressure gracefully is the safety net. The better outcome is generating less of it in the first place, by pacing delivery to the downstream's actual ceiling. SimpleQ's per-queue fixed-window rate limit does this: set rateLimitMax requests per rateLimitWindow seconds on the queue, and every job in that queue shares one budget regardless of how many workers you run — so you don't get the per-worker retry storms that manufacture 429s out of thin air.

create-queue.sh

bash

1curl -X POST https://api.simpleq.io/v1/queues \
2  -H "Authorization: Bearer sq_live_..." \
3  -H "Content-Type: application/json" \
4  -d '{
5    "name": "openai-jobs",
6    "webhookUrl": "https://your-app.com/run-llm-job",
7    "template": "openai",
8    "rateLimitMax": 60,
9    "rateLimitWindow": 60
10  }'

With delivery paced at 60 jobs per 60 seconds, most jobs never see a 429 — they're spaced under the downstream's limit before they leave the queue. The defer model then catches the residual backpressure that slips through during spikes or shared-tenant contention. Belt and suspenders: rate limiting prevents most backpressure, defer absorbs the rest, and no job dies for being busy. For the deeper treatment of per-queue rate limiting and shared buckets, see /blog/rate-limiting-strategies-for-apis.

If you'd rather not build the hold-honor-redeliver machinery yourself, SimpleQ does it out of the box: a downstream 429, 503, or 529 is deferred and redelivered on its Retry-After with no attempt burned, on top of configurable backoff, per-queue rate limiting, a dead-letter queue with replay, and per-job observability. See the backpressure use case for a runnable example, or read how SimpleQ's defer-not-fail model compares in SimpleQ vs. QStash.

Frequently asked questions

Backpressure is a downstream service telling you "not now" — it's healthy and your request is valid, but it can't accept more load this instant. A 429 (rate limited), 503 (service unavailable), or 529 (overloaded) is backpressure. A failure is a 500 on a valid request, a timeout, or a connection reset — something actually went wrong. The distinction matters because a failure should burn a retry attempt and a backpressure signal should not.

Try SimpleQ

Ship reliable async work in minutes.

Free tier covers 10,000 job executions a month. No credit card.

Start Free Read the docs

← Previous post

Idempotency keys: making job publishing safe to retry

Why job retries matter (and how to get them right)