When should I choose a workflow engine like Temporal?

When you have multi-step processes that span hours or days, depend on external signals or human approval, need exactly-once semantics across many steps, or have business logic that must survive code changes mid-flight. If you're not doing any of those, a queue is simpler and cheaper.

Is SimpleQ a workflow engine?

No. SimpleQ is a queue with strong retry, scheduling, and rate-limit primitives. It supports simple step chaining (job A's success enqueues job B), but it is not a full workflow engine with deterministic replay or signals. Most teams that think they need a workflow engine actually need this.

Can I start with a queue and graduate to a workflow engine later?

Yes, and that's usually the right path. Migrating individual jobs to a workflow engine is straightforward; building a queue's reliability features on top of a workflow engine is not. Start with a queue, graduate only when you hit specific multi-step coordination needs.

What about BullMQ, Sidekiq, or Resque?

These are self-hosted libraries. They give you queue primitives but not the managed reliability, dashboards, and rate-limit features of a hosted service — and you operate them yourself. Fine for small workloads; usually a regret at scale.

Queue vs workflow engine: what startups actually need

TL;DR

Workflow engines (Temporal, Cadence, Inngest, Trigger.dev) are powerful and overkill for ~90% of teams. Most startups need a queue with strong retries, scheduling, and rate limits — not a stateful DSL with deterministic replay. This post explains the distinction, when each is appropriate, and the migration path between them.

Every team building async infrastructure eventually asks the same question: do we need a workflow engine? It's a fair question — Temporal is brilliant, Inngest is well-designed, and once you've watched a coworker reimplement retry-with-backoff for the third time, anything with 'durable' in its tagline starts to look appealing.

But the question is often the wrong one. The real question is: what do you need that a queue doesn't already give you? For most teams, the honest answer is 'nothing yet.' This post is about the difference, when each tool earns its complexity, and the decision framework we use when teams ask.

Definitions, briefly

A queue runs individual jobs reliably. You push a job onto it; the queue stores it, hands it to a worker, retries on failure, and gives you a record of what happened. Examples: SQS, SimpleQ, BullMQ, Sidekiq.

A workflow engine runs multi-step processes whose state survives across steps. You write a workflow function that calls activities; the engine durably tracks which steps have run, snapshots state, replays deterministically after crashes, and gives you primitives like signals, timers, child workflows, and human-in-the-loop pauses. Examples: Temporal, Cadence, Inngest (functions), Trigger.dev.

Crucially, every workflow engine has a queue inside it — but exposes a different abstraction. Workflow engines are queues plus durable state plus replay. Whether you need the 'plus' parts is the entire question.

What queues do well

If you're paying attention to where teams actually spend their reliability effort, queues handle most of it. The four big wins:

Decoupling. Your request handler enqueues; your worker processes. The two scale independently and fail independently.
Retries. Transient failures (429s, 5xxs, network hiccups) retry with backoff. Permanent failures land in a dead-letter queue.
Scheduling. Run something in 5 minutes, next Tuesday, or every weekday morning. No cron servers.
Rate limiting. Token buckets per upstream (OpenAI, Twilio, Stripe) so bursty traffic doesn't break provider limits.

If you have an AI app, a webhook delivery system, an outbound messaging product, or any batch-API workload — that's a queue's job description. You can build it on a queue today and it will probably be the answer for years.

What workflow engines add

Workflow engines earn their complexity when the unit of work spans many steps with state between them, and that state is expensive (or impossible) to reconstruct. The classic examples:

Long-running orchestrations. A signup flow that calls Stripe, sends an email, waits for the user to verify (potentially days), then provisions resources. The state has to survive across those waits.
Human-in-the-loop processes. Approval workflows, manual review queues, anything where the workflow pauses and resumes when a human acts.
Sagas with compensation. Multi-step transactions where step 4 failing means undoing steps 1–3 in a specific order.
Deterministic replay. Critical workflows (financial, regulatory) where you need to prove what happened and re-run it identically.
Versioned business logic. Workflows that started before today's deploy must continue running yesterday's code; new workflows use today's.

If you read that list and shrugged, you don't need a workflow engine today.

Comparison at a glance

Capability	Queue	Workflow engine
Run a single reliable job	✓	✓
Retries with backoff	✓	✓
Scheduled jobs	✓	✓
Rate limits per upstream	Usually ✓	Sometimes
Multi-step chaining	Simple chains	✓
Durable state across steps	—	✓
Signals / waits / timers	—	✓
Deterministic replay	—	✓
Human-in-the-loop	—	✓
Operational complexity	Low	High
Learning curve	Hours	Weeks
Vendor lock-in	Low	High (DSL)
Cost at 1M jobs/mo	$$	$$$

The hidden cost of a workflow engine

Workflow engines come with three costs people underestimate:

1. The DSL is real lock-in

When you write a Temporal workflow, you're not writing normal Go or TypeScript — you're writing in a constrained subset that has to be deterministic. No Math.random(), no Date.now(), no random I/O. Every workflow function is a permanent decision; rewriting one means handling old in-flight versions. This is a feature, not a bug — it's what enables replay. But it's also a serious commitment.

2. Operating it isn't free

Self-hosted Temporal is a small Cassandra (or Postgres) cluster, a server, and a UI. Production-grade self-hosting is a half-time job for an engineer. Managed offerings (Temporal Cloud, Inngest) solve that — but at a price that grows with workflow count, not just job count.

3. Teams use 5% of the capability

The most common pattern after adopting a workflow engine is: most teams use it as a fancy queue. They never write a signal, never use a timer that spans days, never build a saga. They paid the DSL tax and the ops tax to get features they didn't need. A queue would have done.

A decision framework

Here's the heuristic we use. Walk down the list. If you can answer 'yes' to any of them, you might need a workflow engine. If you can't, a queue with strong primitives is the right answer.

1Do you have processes that span hours or days and need to survive deploys without restarting?
2Do you have human-in-the-loop steps where the process must pause until a person acts?
3Do you have sagas with explicit compensation logic across many services?
4Are you in a regulated environment that requires deterministic replay of business logic?
5Do you need to version workflow logic such that old in-flight workflows finish on old code?

If you said 'no' to all five, congratulations — you don't need a workflow engine. You need a queue with retries, scheduling, rate limits, and good observability. That's most teams.

Migration path: queue → workflow engine

If you start with a queue and later find you need workflow features, migration is usually painless: the parts that don't need durable state stay as queue jobs, and the multi-step coordination moves into a workflow. Workflow engines can call into your existing job handlers — they don't replace your code, they orchestrate it.

Migrating the other direction — from a workflow engine to a queue — is much harder. The DSL leaks into your code, your team builds intuition around durable state, and untangling that takes months. This asymmetry is a strong argument for starting simple.

For a typical startup shipping AI, SaaS, or automation products, the right starting stack is:

A hosted queue with retries, scheduling, rate limits, and dashboards (this is what SimpleQ does).
A few small workers that consume jobs from named queues.
Simple chains for multi-step work — job A's success enqueues job B.

Adopt a workflow engine when — and only when — you hit one of the five questions above. By then, you'll know exactly which capability you need and you can choose the right tool deliberately, rather than as a defense against complexity you haven't actually encountered yet.

See the docs for SimpleQ's queue primitives, or the use cases for end-to-end patterns including step chaining and scheduled jobs.

Frequently asked questions

A queue runs individual jobs reliably — enqueue, retry, dead-letter, observe. A workflow engine adds durable state across many steps: signals, timers, child workflows, deterministic replay, and long-lived stateful processes. Workflow engines are a superset, but most teams only need the queue half.

Try SimpleQ

Ship reliable async work in minutes.

Free tier covers 10,000 job executions a month. No credit card.

Start Free Read the docs

← Previous post

Why every AI app needs a reliable execution layer

How to handle OpenAI rate limits in production

Queue vs workflow engine: a pragmatic guide for engineering teams