A managed job queue is a hosted service that durably stores background work and delivers it to your code with retries, rate limiting, backpressure, idempotency, and a dead-letter queue built in — so you stop operating Redis and writing retry loops. Push-based queues (SimpleQ) call your webhook; pull-based queues (SQS) make your workers poll; self-hosted queues (BullMQ on Redis) make you run the broker. You need one once background work must survive crashes and can't be silently lost. You don't need one for tiny apps, and you want a workflow engine instead when the job is really a multi-step orchestration.
"Job queue" is one of those terms that means five different things depending on who's talking. To a Rails dev it's Sidekiq. To an AWS team it's SQS plus a Lambda. To a platform engineer it's RabbitMQ with a dead-letter exchange. This post defines the category cleanly — what a managed job queue is, how the three delivery models differ, the capabilities that actually matter, and the concrete signals that tell you you've outgrown inline code and cron. It's also honest about when you don't need one.
What a managed job queue actually is
Strip away the implementations and a job queue does one thing: it accepts a unit of work now and guarantees that work gets executed later, even if the process that created it dies a millisecond after enqueueing. That decoupling — between accepting work and doing it — is the whole point. The accept side returns fast; the execution side runs on its own schedule with its own reliability guarantees.
A managed job queue is that, run as a service. You don't provision a broker, tune its persistence, or page yourself when it falls over. You get an API to enqueue, a durable store you don't operate, and an execution path with retries and a dead-letter queue already wired up. The category is sometimes called a "transport" — the Stripe-for-backend framing: you hand it work, it handles delivery, you handle the logic.
There's a meaningful split inside "managed," though, around how work gets to your code:
- Push (webhook delivery). The queue calls your endpoint when there's work. You expose an HTTP handler; the queue POSTs the job to it. No poll loop, no idle workers waiting. SimpleQ works this way — you POST a job, SimpleQ durably stores it and POSTs it to your own worker URL.
- Pull (polling). Your workers continuously ask the queue for work, process it, and acknowledge it. You own the poll loop and the worker fleet. SQS is the canonical example.
A managed job queue is transport, not an execution sandbox. It doesn't run your functions on its infrastructure; it durably moves a job to a URL you control and waits for you to report the outcome. That's the difference between a queue and a code-execution platform — and it's why a queue stays simple: your code, your runtime, your dependencies.
The three models: self-hosted, pull, and push
Most teams pass through these in order as they grow. Here's how they compare on the dimensions that actually cost you time:
| Dimension | Self-hosted (Redis + BullMQ) | Pull / managed (SQS) | Push / managed (SimpleQ) |
|---|---|---|---|
| Who runs the broker | You (provisioning, failover, upgrades) | Provider | Provider |
| How work reaches code | Workers poll Redis | Workers poll the queue | Queue POSTs to your endpoint |
| Idle worker cost | Always-on processes | Always-on pollers | None — serverless-friendly |
| Retries & backoff | You configure in code | Redrive policy + your loop | Built in (exp/fixed, maxAttempts) |
| Rate limiting upstream | DIY (e.g. shared bucket) | DIY | Per-queue, built in |
| Dead-letter queue | Manual setup | DLQ + manual redrive | DLQ with single + bulk replay |
| Best fit | Full control, in-VPC | AWS-native, high throughput | AI/API backends, serverless |
None of these is universally "better." Self-hosting wins when you need the broker inside your VPC or you have a platform team that wants total control. Pull wins for very high, steady throughput inside AWS. Push wins when your work is bursty, calls external APIs, and you'd rather not run a poll loop or keep idle workers warm — which describes most AI and API-dependent backends. If you're weighing specific products, the head-to-head pages cover the trade-offs in detail.
The capabilities that actually matter
A queue that only stores and delivers is half a queue. The reason to adopt one is the reliability machinery around delivery. These are the capabilities to evaluate, and what each one buys you:
- Retries with backoff. When execution fails, the queue re-delivers on a schedule (exponential or fixed) up to a cap. SimpleQ supports configurable backoff and a
maxAttemptscap (up to 20). This is the single feature most hand-rolled background jobs get wrong. - Rate limiting. A per-queue limit (e.g.
rateLimitMaxperrateLimitWindow) caps how fast work is delivered, so you respect an upstream API's quota across your whole fleet — not per worker, which multiplies and causes storms. - Backpressure. When the downstream is throttled (a 429/503/529 with a Retry-After), the job is deferred and re-delivered later instead of burning a retry attempt. A job can ride out a sustained rate limit and still complete.
- Dead-letter queue (DLQ). After the last attempt fails, the job lands in a DLQ instead of vanishing — inspectable, and replayable one-by-one or in bulk once you've fixed the cause.
- Idempotency. A publish-boundary
idempotencyKeydedupes enqueues, so a retried POST from your own app doesn't create two jobs. - Signed delivery. For push queues, HMAC-SHA256 signing (SimpleQ uses an
x-simpleq-signatureheader over the raw body) lets your endpoint verify the request really came from the queue. - Delayed jobs. Schedule a one-shot job for later (SimpleQ supports a
delayup to 24h) without standing up cron.
If you build a queue yourself, you build all of these yourself — and the order you discover you need them is usually: retries (week one), DLQ (the first time a bug eats a thousand jobs), rate limiting (the first 429 storm), idempotency (the first double-charge). A managed queue front-loads them.
How the queue knows whether your job succeeded
Delivery is only half a contract — the queue also needs to know the outcome so it can retry, dead-letter, or back off correctly. With a pull queue you signal this by deleting (or not deleting) the message. With a push queue like SimpleQ, you report it explicitly. SimpleQ uses a three-signal ack protocol:
- ack —
POST /v1/jobs/:id/ack. The job succeeded; mark it done. - nack —
POST /v1/jobs/:id/nackwith aretryableflag. It failed; retry if retryable, otherwise send it to the DLQ. - defer —
POST /v1/jobs/:id/deferwithretryAfterseconds. The downstream is throttled; come back later without burning an attempt.
This separation matters. A naive system treats every non-200 as a failure and burns attempts on transient throttling; the defer signal turns a 429 into "not now" rather than "failed." There are two delivery modes for the timing of all this: standard mode expects your handler to finish within a hard 15-second webhook timeout, and ack mode lets you return 200 immediately and report the outcome later — which is what long-running AI calls need. SimpleQ ships queue templates for exactly this: an anthropic template (up to 600s) and an openai template (up to 300s).
1import express from "express";2 3const app = express();4app.use(express.json());5 6// SimpleQ POSTs the job here. We run the logic, then report the outcome.7app.post("/jobs/process", async (req, res) => {8 const { id, payload } = req.body;9 10 // Return fast; we'll report the real outcome via the ack protocol.11 res.sendStatus(200);12 13 try {14 await callOpenAI(payload); // your business logic, your runtime15 await reportOutcome(id, "ack");16 } catch (err: any) {17 if (err.status === 429) {18 // Downstream throttled us — defer, don't burn an attempt.19 await reportOutcome(id, "defer", { retryAfter: 30 });20 } else {21 await reportOutcome(id, "nack", { retryable: err.status >= 500 });22 }23 }24});25 26async function reportOutcome(27 id: string,28 signal: "ack" | "nack" | "defer",29 body: Record<string, unknown> = {},30) {31 await fetch(`https://api.simpleq.io/v1/jobs/${id}/${signal}`, {32 method: "POST",33 headers: {34 Authorization: "Bearer sq_live_...",35 "Content-Type": "application/json",36 },37 body: JSON.stringify(body),38 });39}Signals you've outgrown inline code, cron, and Redis
You rarely decide to adopt a queue in the abstract — you notice symptoms. Here are the concrete ones, roughly in the order they appear as an app grows:
- 1You're doing slow work inside the request. A user clicks "generate" and waits 12 seconds while you call gpt-4o-mini or claude-sonnet-4-6 synchronously. The request times out under load, and a crash mid-call loses the work entirely.
- 2You retry by hand. There's a
try/catchwithsetTimeout(retry, 1000)somewhere, and it doesn't survive a deploy or a crash. This is the clearest signal: you've started reimplementing a queue, badly. - 3Cron jobs silently skip. Your nightly sync runs on a single cron entry. When the box restarts at the wrong minute, it just doesn't run, and nobody notices until the data's stale.
- 4Third-party 429s take down a feature. You hammer OpenAI or Anthropic from every worker, hit the rate limit, and have no shared budget — so retries pile into a storm instead of smoothing out.
- 5You're running Redis just for jobs. You stood up Redis and BullMQ, and now you're patching it, sizing it, and getting paged for it — for what is, conceptually, "call this API and retry if it fails."
- 6Failures vanish. A job fails its last retry and disappears. You have no record, no replay, and no way to answer "did that webhook actually get processed?"
If you nodded at two or more of these, you've outgrown the inline-and-cron phase. The fix isn't more clever retry code in your app — it's moving durability, retries, and the DLQ out of your app and into a queue. Retries specifically are where most teams lose data first; we go deep on why in why webhook retries matter.
When you don't need one (and when you want a workflow engine instead)
Adopting a queue too early is a real failure mode — you add a network hop and an external dependency to solve a problem you don't have yet. You probably don't need a managed job queue if:
- Your background work is genuinely fire-and-forget and you're fine losing it on a crash (analytics pings, best-effort cache warms).
- Your app does no slow or external-API work in the request path, and has no background or delayed jobs.
- You're a tiny app with a single process and no reliability requirements yet —
awaitthe work inline until it actually hurts.
The other place teams misfire is reaching for a queue when the problem is orchestration. A job queue moves one unit of work reliably; it doesn't model a process with multiple steps, branches, waits, and shared state. If your task is "call this API, retry on failure, respect the rate limit" — that's a job, and a queue is the right tool. If it's "do step A, wait for a human approval, then fan out to N sub-tasks, then aggregate" — that's a workflow, and a workflow engine is the better fit.
The most common over-engineering we see: a team builds a multi-step "workflow" whose every step is really just one durable API call with retries. That's a queue wearing a workflow's clothes. Reach for orchestration when you have genuine cross-step state and branching — not just to get retries. We cover the dividing line in queue vs workflow engine.
What adoption actually looks like
With a managed push queue, getting started is two API calls and one endpoint. You create a queue with the reliability policy you want, expose a handler for SimpleQ to POST to, and enqueue work:
1# 1. Create a queue with retries + a per-queue rate limit2curl -X POST https://api.simpleq.io/v1/queues \3 -H "Authorization: Bearer sq_live_..." \4 -H "Content-Type: application/json" \5 -d '{6 "name": "ai-jobs",7 "maxAttempts": 5,8 "backoff": "exponential",9 "rateLimitMax": 50,10 "rateLimitWindow": 6011 }'12 13# 2. Enqueue a job — SimpleQ will POST it to your worker endpoint14curl -X POST https://api.simpleq.io/v1/queues/ai-jobs/jobs \15 -H "Authorization: Bearer sq_live_..." \16 -H "Content-Type: application/json" \17 -d '{18 "payload": {19 "model": "gpt-4o-mini",20 "messages": [{ "role": "user", "content": "Summarize this..." }]21 },22 "idempotencyKey": "summary_doc-123"23 }'The queue now owns durability, the retry schedule, the rate-limit budget shared across your whole fleet, and the DLQ. Your code owns the call to OpenAI or Anthropic and the ack/nack/defer signal. The official TypeScript SDK — @simpleq/sdk on npm — wraps these calls, and because the API is HTTP-first underneath, any language that can make a request works.
If you've recognized the signals above and want the retry engine, rate limiting, backpressure, and dead-letter replay without operating Redis, that's exactly what SimpleQ provides — a push-based managed job queue that delivers to your own endpoint. See the AI job processing use case for an end-to-end example, or the comparison pages if you're evaluating it against SQS, BullMQ, and the rest. And if you're still deciding between a queue and orchestration, read queue vs workflow engine and why job retries matter before you build.
Frequently asked questions
Ship reliable async work in minutes.
Free tier covers 10,000 job executions a month. No credit card.