What is the difference between a managed queue and a self-hosted queue?

A self-hosted queue (Redis + BullMQ, RabbitMQ) means you operate the broker: provisioning, persistence config, failover, version upgrades, and monitoring are all yours. A managed queue removes that operational surface — the durability, retry engine, rate limiter, and dead-letter handling are run by the provider. The trade is control for time: self-hosting gives you maximum flexibility, managed gives you a working retry-and-DLQ pipeline on day one.

What is the difference between push and pull job queues?

In a pull queue (like SQS), your workers continuously poll the queue, fetch messages, process them, and delete them on success — you own the poll loop, visibility timeouts, and the worker fleet that runs it. In a push queue (like SimpleQ), the queue calls your endpoint over HTTP when there's work; there's no poll loop and no idle workers. Push pairs naturally with serverless and request/response style workers because delivery is just an inbound HTTP request you already know how to handle.

When do I actually need a job queue?

You need one when work must outlive the request that triggered it and must not be lost: anything that calls a flaky third-party API, anything slow enough to time out an HTTP request, anything you retry by hand today, or anything you run on cron that occasionally just... doesn't run. If your background work is a fire-and-forget call you're fine losing, you don't need a queue yet.

Is a job queue the same as a workflow engine?

No. A job queue moves a single unit of work reliably from A to B with retries and rate limits. A workflow engine orchestrates multi-step processes with branching, fan-out, waits, and durable state across steps. Many teams reach for a workflow engine when a queue would do — if your 'workflow' is really one durable job that calls one API, a queue is simpler and cheaper to reason about. See our breakdown in queue vs workflow engine.

What is a managed job queue? (and when you need one)

TL;DR

A managed job queue is a hosted service that durably stores background work and delivers it to your code with retries, rate limiting, backpressure, idempotency, and a dead-letter queue built in — so you stop operating Redis and writing retry loops. Push-based queues (SimpleQ) call your webhook; pull-based queues (SQS) make your workers poll; self-hosted queues (BullMQ on Redis) make you run the broker. You need one once background work must survive crashes and can't be silently lost. You don't need one for tiny apps, and you want a workflow engine instead when the job is really a multi-step orchestration.

"Job queue" is one of those terms that means five different things depending on who's talking. To a Rails dev it's Sidekiq. To an AWS team it's SQS plus a Lambda. To a platform engineer it's RabbitMQ with a dead-letter exchange. This post defines the category cleanly — what a managed job queue is, how the three delivery models differ, the capabilities that actually matter, and the concrete signals that tell you you've outgrown inline code and cron. It's also honest about when you don't need one.

What a managed job queue actually is

Strip away the implementations and a job queue does one thing: it accepts a unit of work now and guarantees that work gets executed later, even if the process that created it dies a millisecond after enqueueing. That decoupling — between accepting work and doing it — is the whole point. The accept side returns fast; the execution side runs on its own schedule with its own reliability guarantees.

A managed job queue is that, run as a service. You don't provision a broker, tune its persistence, or page yourself when it falls over. You get an API to enqueue, a durable store you don't operate, and an execution path with retries and a dead-letter queue already wired up. The category is sometimes called a "transport" — the Stripe-for-backend framing: you hand it work, it handles delivery, you handle the logic.

There's a meaningful split inside "managed," though, around how work gets to your code:

Push (webhook delivery). The queue calls your endpoint when there's work. You expose an HTTP handler; the queue POSTs the job to it. No poll loop, no idle workers waiting. SimpleQ works this way — you POST a job, SimpleQ durably stores it and POSTs it to your own worker URL.
Pull (polling). Your workers continuously ask the queue for work, process it, and acknowledge it. You own the poll loop and the worker fleet. SQS is the canonical example.

It delivers to your endpoint — it doesn't run your code

A managed job queue is transport, not an execution sandbox. It doesn't run your functions on its infrastructure; it durably moves a job to a URL you control and waits for you to report the outcome. That's the difference between a queue and a code-execution platform — and it's why a queue stays simple: your code, your runtime, your dependencies.

The three models: self-hosted, pull, and push

Most teams pass through these in order as they grow. Here's how they compare on the dimensions that actually cost you time:

Dimension	Self-hosted (Redis + BullMQ)	Pull / managed (SQS)	Push / managed (SimpleQ)
Who runs the broker	You (provisioning, failover, upgrades)	Provider	Provider
How work reaches code	Workers poll Redis	Workers poll the queue	Queue POSTs to your endpoint
Idle worker cost	Always-on processes	Always-on pollers	None — serverless-friendly
Retries & backoff	You configure in code	Redrive policy + your loop	Built in (exp/fixed, maxAttempts)
Rate limiting upstream	DIY (e.g. shared bucket)	DIY	Per-queue, built in
Dead-letter queue	Manual setup	DLQ + manual redrive	DLQ with single + bulk replay
Best fit	Full control, in-VPC	AWS-native, high throughput	AI/API backends, serverless

None of these is universally "better." Self-hosting wins when you need the broker inside your VPC or you have a platform team that wants total control. Pull wins for very high, steady throughput inside AWS. Push wins when your work is bursty, calls external APIs, and you'd rather not run a poll loop or keep idle workers warm — which describes most AI and API-dependent backends. If you're weighing specific products, the head-to-head pages cover the trade-offs in detail.

The capabilities that actually matter

A queue that only stores and delivers is half a queue. The reason to adopt one is the reliability machinery around delivery. These are the capabilities to evaluate, and what each one buys you:

Retries with backoff. When execution fails, the queue re-delivers on a schedule (exponential or fixed) up to a cap. SimpleQ supports configurable backoff and a maxAttempts cap (up to 20). This is the single feature most hand-rolled background jobs get wrong.
Rate limiting. A per-queue limit (e.g. rateLimitMax per rateLimitWindow) caps how fast work is delivered, so you respect an upstream API's quota across your whole fleet — not per worker, which multiplies and causes storms.
Backpressure. When the downstream is throttled (a 429/503/529 with a Retry-After), the job is deferred and re-delivered later instead of burning a retry attempt. A job can ride out a sustained rate limit and still complete.
Dead-letter queue (DLQ). After the last attempt fails, the job lands in a DLQ instead of vanishing — inspectable, and replayable one-by-one or in bulk once you've fixed the cause.
Idempotency. A publish-boundary idempotencyKey dedupes enqueues, so a retried POST from your own app doesn't create two jobs.
Signed delivery. For push queues, HMAC-SHA256 signing (SimpleQ uses an x-simpleq-signature header over the raw body) lets your endpoint verify the request really came from the queue.
Delayed jobs. Schedule a one-shot job for later (SimpleQ supports a delay up to 24h) without standing up cron.

If you build a queue yourself, you build all of these yourself — and the order you discover you need them is usually: retries (week one), DLQ (the first time a bug eats a thousand jobs), rate limiting (the first 429 storm), idempotency (the first double-charge). A managed queue front-loads them.

How the queue knows whether your job succeeded

Delivery is only half a contract — the queue also needs to know the outcome so it can retry, dead-letter, or back off correctly. With a pull queue you signal this by deleting (or not deleting) the message. With a push queue like SimpleQ, you report it explicitly. SimpleQ uses a three-signal ack protocol:

ack — POST /v1/jobs/:id/ack. The job succeeded; mark it done.
nack — POST /v1/jobs/:id/nack with a retryable flag. It failed; retry if retryable, otherwise send it to the DLQ.
defer — POST /v1/jobs/:id/defer with retryAfter seconds. The downstream is throttled; come back later without burning an attempt.

This separation matters. A naive system treats every non-200 as a failure and burns attempts on transient throttling; the defer signal turns a 429 into "not now" rather than "failed." There are two delivery modes for the timing of all this: standard mode expects your handler to finish within a hard 15-second webhook timeout, and ack mode lets you return 200 immediately and report the outcome later — which is what long-running AI calls need. SimpleQ ships queue templates for exactly this: an anthropic template (up to 600s) and an openai template (up to 300s).

worker.ts

1import express from "express";
2 
3const app = express();
4app.use(express.json());
5 
6// SimpleQ POSTs the job here. We run the logic, then report the outcome.
7app.post("/jobs/process", async (req, res) => {
8  const { id, payload } = req.body;
9 
10  // Return fast; we'll report the real outcome via the ack protocol.
11  res.sendStatus(200);
12 
13  try {
14    await callOpenAI(payload); // your business logic, your runtime
15    await reportOutcome(id, "ack");
16  } catch (err: any) {
17    if (err.status === 429) {
18      // Downstream throttled us — defer, don't burn an attempt.
19      await reportOutcome(id, "defer", { retryAfter: 30 });
20    } else {
21      await reportOutcome(id, "nack", { retryable: err.status >= 500 });
22    }
23  }
24});
25 
26async function reportOutcome(
27  id: string,
28  signal: "ack" | "nack" | "defer",
29  body: Record<string, unknown> = {},
30) {
31  await fetch(`https://api.simpleq.io/v1/jobs/${id}/${signal}`, {
32    method: "POST",
33    headers: {
34      Authorization: "Bearer sq_live_...",
35      "Content-Type": "application/json",
36    },
37    body: JSON.stringify(body),
38  });
39}

Signals you've outgrown inline code, cron, and Redis

You rarely decide to adopt a queue in the abstract — you notice symptoms. Here are the concrete ones, roughly in the order they appear as an app grows:

1You're doing slow work inside the request. A user clicks "generate" and waits 12 seconds while you call gpt-4o-mini or claude-sonnet-4-6 synchronously. The request times out under load, and a crash mid-call loses the work entirely.
2You retry by hand. There's a try/catch with setTimeout(retry, 1000) somewhere, and it doesn't survive a deploy or a crash. This is the clearest signal: you've started reimplementing a queue, badly.
3Cron jobs silently skip. Your nightly sync runs on a single cron entry. When the box restarts at the wrong minute, it just doesn't run, and nobody notices until the data's stale.
4Third-party 429s take down a feature. You hammer OpenAI or Anthropic from every worker, hit the rate limit, and have no shared budget — so retries pile into a storm instead of smoothing out.
5You're running Redis just for jobs. You stood up Redis and BullMQ, and now you're patching it, sizing it, and getting paged for it — for what is, conceptually, "call this API and retry if it fails."
6Failures vanish. A job fails its last retry and disappears. You have no record, no replay, and no way to answer "did that webhook actually get processed?"

If you nodded at two or more of these, you've outgrown the inline-and-cron phase. The fix isn't more clever retry code in your app — it's moving durability, retries, and the DLQ out of your app and into a queue. Retries specifically are where most teams lose data first; we go deep on why in why webhook retries matter.

When you don't need one (and when you want a workflow engine instead)

Adopting a queue too early is a real failure mode — you add a network hop and an external dependency to solve a problem you don't have yet. You probably don't need a managed job queue if:

Your background work is genuinely fire-and-forget and you're fine losing it on a crash (analytics pings, best-effort cache warms).
Your app does no slow or external-API work in the request path, and has no background or delayed jobs.
You're a tiny app with a single process and no reliability requirements yet — await the work inline until it actually hurts.

The other place teams misfire is reaching for a queue when the problem is orchestration. A job queue moves one unit of work reliably; it doesn't model a process with multiple steps, branches, waits, and shared state. If your task is "call this API, retry on failure, respect the rate limit" — that's a job, and a queue is the right tool. If it's "do step A, wait for a human approval, then fan out to N sub-tasks, then aggregate" — that's a workflow, and a workflow engine is the better fit.

Don't model a queue as a workflow

The most common over-engineering we see: a team builds a multi-step "workflow" whose every step is really just one durable API call with retries. That's a queue wearing a workflow's clothes. Reach for orchestration when you have genuine cross-step state and branching — not just to get retries. We cover the dividing line in queue vs workflow engine.

What adoption actually looks like

With a managed push queue, getting started is two API calls and one endpoint. You create a queue with the reliability policy you want, expose a handler for SimpleQ to POST to, and enqueue work:

create-and-enqueue.sh

bash

1# 1. Create a queue with retries + a per-queue rate limit
2curl -X POST https://api.simpleq.io/v1/queues \
3  -H "Authorization: Bearer sq_live_..." \
4  -H "Content-Type: application/json" \
5  -d '{
6    "name": "ai-jobs",
7    "maxAttempts": 5,
8    "backoff": "exponential",
9    "rateLimitMax": 50,
10    "rateLimitWindow": 60
11  }'
12 
13# 2. Enqueue a job — SimpleQ will POST it to your worker endpoint
14curl -X POST https://api.simpleq.io/v1/queues/ai-jobs/jobs \
15  -H "Authorization: Bearer sq_live_..." \
16  -H "Content-Type: application/json" \
17  -d '{
18    "payload": {
19      "model": "gpt-4o-mini",
20      "messages": [{ "role": "user", "content": "Summarize this..." }]
21    },
22    "idempotencyKey": "summary_doc-123"
23  }'

The queue now owns durability, the retry schedule, the rate-limit budget shared across your whole fleet, and the DLQ. Your code owns the call to OpenAI or Anthropic and the ack/nack/defer signal. The official TypeScript SDK — @simpleq/sdk on npm — wraps these calls, and because the API is HTTP-first underneath, any language that can make a request works.

If you've recognized the signals above and want the retry engine, rate limiting, backpressure, and dead-letter replay without operating Redis, that's exactly what SimpleQ provides — a push-based managed job queue that delivers to your own endpoint. See the AI job processing use case for an end-to-end example, or the comparison pages if you're evaluating it against SQS, BullMQ, and the rest. And if you're still deciding between a queue and orchestration, read queue vs workflow engine and why job retries matter before you build.

Frequently asked questions

A managed job queue is a hosted service that durably accepts background work over an API, stores it, and delivers it to your code for execution — handling retries, backoff, rate limiting, dead-letter queues, and idempotency for you. You don't run any broker infrastructure. With a push-based queue like SimpleQ, you POST a job to the API and the queue POSTs it to your own webhook or worker endpoint; you run the business logic, the queue runs the transport.

Try SimpleQ

Ship reliable async work in minutes.

Free tier covers 10,000 job executions a month. No credit card.

Start Free Read the docs

Rate limiting strategies for API-dependent backends

What is a managed job queue? A practical guide to push delivery, retries, and knowing when you need one