Blog

Engineering notes on reliable execution.

Practical, opinionated writing about AI infrastructure, queues, retries, rate limits, and the sharp edges of running async work in production.

Why every AI app needs a reliable execution layer

LLM calls fail, hit rate limits, and time out — and inline calls are quietly the biggest reliability risk in modern AI apps. Here's how to fix it.

Temporal is amazing — and overkill for most teams. Here's a pragmatic breakdown of when a queue is enough and when you need a full workflow engine.

Tokens-per-minute limits, retry-after headers, and shared buckets across your fleet — a practical guide that doesn't end in 429s.

If your webhook delivery doesn't have backoff, jitter, idempotency, and a dead-letter queue, you are silently losing customer events.

When to reach for Redis, when to reach for a hosted queue, and when to skip both. A guide to staying simple as your traffic grows.