Back to blog
InfrastructureRATE0022026-03-174 min

Stop AI From Writing Naive Rate Limiters

stablestack .
WARNINGRATE002in-memory-rate-limiter
api/middleware.py:3
1 from collections import defaultdict
2
3 request_counts = defaultdict(int)
4
5 def rate_limit(ip):
6 if request_counts[ip] > 100:
7 return False

In-memory rate limiter will reset on deploy and doesn't share state across instances.

stablestack add-rate-limiter
FIXEDRedis-backed sliding window
1 from stablestack_rate_limiter import rate_limit
2
3 @rate_limit(requests=100, window=60)
4 async def api_handler(request):
5 return await process(request)

Survives deploys. Shared across instances. Fails open if Redis is down.

Ask an AI assistant to add rate limiting to your API. Nine times out of ten, you'll get an in-memory dictionary with timestamps. It works in development. It breaks in every way that matters in production.

The problem with in-memory rate limiters

An in-memory rate limiter stores request counts in a Python dictionary or JavaScript Map. The moment you deploy, that state resets to zero — every user gets a fresh allowance. If you're running multiple instances behind a load balancer, each instance has its own counter. A user hitting instance A and instance B gets double the rate limit.

AI assistants write this pattern because it's the simplest solution that satisfies the prompt. It works in the test they run. It even works in staging if you're running a single instance. It fails silently in production — no errors, no warnings, just rate limits that don't actually limit.

What production rate limiting looks like

from stablestack_rate_limiter import rate_limit

@rate_limit(requests=100, window=60)  # 100 req/min
async def api_handler(request):
    return await process(request)

A production rate limiter needs three properties:

1. Persistence across deploys — limits survive restarts 2. Shared state across instances — one counter per user, not per process 3. Atomic operations — no race conditions between check and increment

Redis gives you all three. It's designed for exactly this: fast, atomic, shared state with built-in expiration. A sliding window rate limiter in Redis is about 50 lines of code and handles millions of requests.

Why AI gets this wrong every time

AI models pattern-match to the most common training examples. Rate limiting tutorials overwhelmingly demonstrate in-memory approaches because they're simpler to explain and don't require Redis setup. The model doesn't distinguish between "works for a tutorial" and "works in production."

We see three failure modes:

- Counter reset on deploy: every deployment gives users a fresh rate limit window - Per-instance isolation: 4 instances means 4x the actual rate limit - Race conditions: checking the count and incrementing it aren't atomic, so burst traffic slips through

The StableStack approach

We're building a Redis-backed rate limiting template that follows the same pattern as our other add-ons: a checker that flags unprotected endpoints, a scaffold command that installs a drop-in module, and a slash command that instruments your routes.

StableStack already has RATE001, which flags API endpoints without any rate limiting. The template takes it further — when you run the scaffold, you get a production-ready sliding window implementation with Redis, a @rate_limit decorator, configurable windows and limits, and proper error responses with Retry-After headers.

This is another free add-on. One command installs the module, one slash command instruments your existing routes. No more in-memory counters that reset on deploy.

What to do today

If you have AI-generated rate limiting in your codebase, check for these red flags:

- A dictionary or Map storing request counts - No Redis, Memcached, or database backing the counter - Counters initialized at module import time - No atomic check-and-increment operation

If you see any of these, your rate limiter is probably not limiting anything in production. StableStack's RATE001 checker flags endpoints without rate limiting entirely — run stablestack . to see which routes are exposed.

RATE002 is available with a StableStack license.

pip install stablestack