Workflow engines (Temporal, Cadence, Inngest, Trigger.dev) are powerful and overkill for ~90% of teams. Most startups need a queue with strong retries, scheduling, and rate limits — not a stateful DSL with deterministic replay. This post explains the distinction, when each is appropriate, and the migration path between them.
Every team building async infrastructure eventually asks the same question: do we need a workflow engine? It's a fair question — Temporal is brilliant, Inngest is well-designed, and once you've watched a coworker reimplement retry-with-backoff for the third time, anything with 'durable' in its tagline starts to look appealing.
But the question is often the wrong one. The real question is: what do you need that a queue doesn't already give you? For most teams, the honest answer is 'nothing yet.' This post is about the difference, when each tool earns its complexity, and the decision framework we use when teams ask.
Definitions, briefly
A queue runs individual jobs reliably. You push a job onto it; the queue stores it, hands it to a worker, retries on failure, and gives you a record of what happened. Examples: SQS, SimpleQ, BullMQ, Sidekiq.
A workflow engine runs multi-step processes whose state survives across steps. You write a workflow function that calls activities; the engine durably tracks which steps have run, snapshots state, replays deterministically after crashes, and gives you primitives like signals, timers, child workflows, and human-in-the-loop pauses. Examples: Temporal, Cadence, Inngest (functions), Trigger.dev.
Crucially, every workflow engine has a queue inside it — but exposes a different abstraction. Workflow engines are queues plus durable state plus replay. Whether you need the 'plus' parts is the entire question.
What queues do well
If you're paying attention to where teams actually spend their reliability effort, queues handle most of it. The four big wins:
- Decoupling. Your request handler enqueues; your worker processes. The two scale independently and fail independently.
- Retries. Transient failures (429s, 5xxs, network hiccups) retry with backoff. Permanent failures land in a dead-letter queue.
- Scheduling. Run something in 5 minutes, next Tuesday, or every weekday morning. No cron servers.
- Rate limiting. Token buckets per upstream (OpenAI, Twilio, Stripe) so bursty traffic doesn't break provider limits.
If you have an AI app, a webhook delivery system, an outbound messaging product, or any batch-API workload — that's a queue's job description. You can build it on a queue today and it will probably be the answer for years.
What workflow engines add
Workflow engines earn their complexity when the unit of work spans many steps with state between them, and that state is expensive (or impossible) to reconstruct. The classic examples:
- Long-running orchestrations. A signup flow that calls Stripe, sends an email, waits for the user to verify (potentially days), then provisions resources. The state has to survive across those waits.
- Human-in-the-loop processes. Approval workflows, manual review queues, anything where the workflow pauses and resumes when a human acts.
- Sagas with compensation. Multi-step transactions where step 4 failing means undoing steps 1–3 in a specific order.
- Deterministic replay. Critical workflows (financial, regulatory) where you need to prove what happened and re-run it identically.
- Versioned business logic. Workflows that started before today's deploy must continue running yesterday's code; new workflows use today's.
If you read that list and shrugged, you don't need a workflow engine today.
Comparison at a glance
| Capability | Queue | Workflow engine |
|---|---|---|
| Run a single reliable job | ✓ | ✓ |
| Retries with backoff | ✓ | ✓ |
| Scheduled jobs | ✓ | ✓ |
| Rate limits per upstream | Usually ✓ | Sometimes |
| Multi-step chaining | Simple chains | ✓ |
| Durable state across steps | — | ✓ |
| Signals / waits / timers | — | ✓ |
| Deterministic replay | — | ✓ |
| Human-in-the-loop | — | ✓ |
| Operational complexity | Low | High |
| Learning curve | Hours | Weeks |
| Vendor lock-in | Low | High (DSL) |
| Cost at 1M jobs/mo | $$ | $$$ |
The hidden cost of a workflow engine
Workflow engines come with three costs people underestimate:
1. The DSL is real lock-in
When you write a Temporal workflow, you're not writing normal Go or TypeScript — you're writing in a constrained subset that has to be deterministic. No Math.random(), no Date.now(), no random I/O. Every workflow function is a permanent decision; rewriting one means handling old in-flight versions. This is a feature, not a bug — it's what enables replay. But it's also a serious commitment.
2. Operating it isn't free
Self-hosted Temporal is a small Cassandra (or Postgres) cluster, a server, and a UI. Production-grade self-hosting is a half-time job for an engineer. Managed offerings (Temporal Cloud, Inngest) solve that — but at a price that grows with workflow count, not just job count.
3. Teams use 5% of the capability
The most common pattern after adopting a workflow engine is: most teams use it as a fancy queue. They never write a signal, never use a timer that spans days, never build a saga. They paid the DSL tax and the ops tax to get features they didn't need. A queue would have done.
A decision framework
Here's the heuristic we use. Walk down the list. If you can answer 'yes' to any of them, you might need a workflow engine. If you can't, a queue with strong primitives is the right answer.
- 1Do you have processes that span hours or days and need to survive deploys without restarting?
- 2Do you have human-in-the-loop steps where the process must pause until a person acts?
- 3Do you have sagas with explicit compensation logic across many services?
- 4Are you in a regulated environment that requires deterministic replay of business logic?
- 5Do you need to version workflow logic such that old in-flight workflows finish on old code?
If you said 'no' to all five, congratulations — you don't need a workflow engine. You need a queue with retries, scheduling, rate limits, and good observability. That's most teams.
Migration path: queue → workflow engine
If you start with a queue and later find you need workflow features, migration is usually painless: the parts that don't need durable state stay as queue jobs, and the multi-step coordination moves into a workflow. Workflow engines can call into your existing job handlers — they don't replace your code, they orchestrate it.
Migrating the other direction — from a workflow engine to a queue — is much harder. The DSL leaks into your code, your team builds intuition around durable state, and untangling that takes months. This asymmetry is a strong argument for starting simple.
What we recommend
For a typical startup shipping AI, SaaS, or automation products, the right starting stack is:
- A hosted queue with retries, scheduling, rate limits, and dashboards (this is what SimpleQ does).
- A few small workers that consume jobs from named queues.
- Simple chains for multi-step work — job A's success enqueues job B.
Adopt a workflow engine when — and only when — you hit one of the five questions above. By then, you'll know exactly which capability you need and you can choose the right tool deliberately, rather than as a defense against complexity you haven't actually encountered yet.
See the docs for SimpleQ's queue primitives, or the use cases for end-to-end patterns including step chaining and scheduled jobs.
Frequently asked questions
Ship reliable async work in minutes.
Free tier covers 10,000 job executions a month. No credit card.