Skip to main content

Amazon Had Infinite Engineers and Still Shipped From a Stale Wiki

· 6 min read
Codalio Team
AI app builder team

Amazon had infinite engineers. It still shipped from a stale wiki.

In November 2025, Amazon told its engineers to use its in-house AI coding tool for at least 80% of their work each week. The goal was adoption — a metric, tracked weekly, with a number to hit.

Four months later, on March 5, 2026, a single AI-assisted deploy wiped roughly 6.3 million orders and dropped U.S. order volume by about 99%. The root cause wasn't an exotic model failure. The agent had read an outdated internal wiki, inferred what "correct" meant from it, and shipped accordingly.

Then about 1,500 engineers signed a petition. Their argument wasn't "AI is bad." It was that the company had chased a usage target instead of quality — and that they'd rather pick their own tools, like Claude Code, than hit a quota.

Here's the part that should bother every founder: Amazon has effectively unlimited engineers and the best tooling money can buy. It still broke. So the lesson can't be "hire more people" or "buy a better model."


The model wasn't the problem. The missing source of truth was.

Strip away the scale and the headline drama, and the failure is almost embarrassingly simple. The AI did exactly what it was told. It just wasn't told the right thing — because the thing it read was wrong.

A wiki is a description of how something worked at some point in the past, written by a human, rarely updated, and accountable to no test. When an agent treats that as ground truth, it confidently builds the past. There was no authoritative, executable definition of correct behavior sitting upstream of the prompt — so the agent grabbed the most plausible document it could find and ran with it.

An AI agent is only ever as correct as the source it reads. Give it a stale wiki and it will ship stale logic at machine speed. The intelligence of the model is almost beside the point; it's a multiplier on whatever truth you hand it.

This is the trap non-technical founders walk into without noticing. You're not deploying to millions of orders. But you're doing the same thing in miniature: prompting an AI to build your MVP from a Notion page, a Slack thread, three voice memos, and whatever you said last Tuesday. The AI guesses your intent from scattered, aging context. And you don't see the gap as a bug in code — you see it as a customer who churned, a flow that did the wrong thing, a refund you had to issue.

Amazon could absorb 6.3 million lost orders and a public revolt. A two-person startup vibing against stale notes has the exact same failure mode and almost no margin to survive it.


Amazon's fix was a reflex. The real fix lives upstream.

When the outage hit, Amazon's response was to require two-person review and senior sign-off on AI-assisted deploys. Human gates, bolted on after the fact.

It's an understandable reflex, and it isn't useless. But notice what it does: it adds friction downstream of the prompt, after the agent has already built from the wrong source. You're now paying senior engineers to catch, by hand, mistakes that originated because nobody defined correct behavior in the first place. It slows everything down and still depends on a reviewer happening to know the wiki was stale.

The contrarian move is to fix the layer everyone skips. Don't review the output harder — fix the input. Put a versioned, executable spec upstream of the prompt, so the truth the agent reads is the truth you actually intended.

A real spec isn't a wiki page. It's the source of truth the build is accountable to. At minimum it pins down:

  • The business logic — what the system should do, stated as rules, not vibes.

  • The edge cases and constraints — what must never happen, in writing.

  • A single versioned source — one place the truth lives, that changes on purpose and leaves a history.

  • Something checkable — behavior defined precisely enough that "did we build the right thing" has an answer before customers find out.

This is the whole of Codalio's wedge, proven here at enterprise scale: vibe coding ships prototypes; spec-driven ships products. Vibing reads a wiki. Shipping reads a spec.


What a founder does Monday morning

You don't need Amazon's budget to apply this. You need to move the truth upstream before you build, not after you break.

Start by writing down what your product must do as rules a stranger could check — not a feature list, but the logic. "When a user does X, the system must do Y, and must never do Z." If you can't state it that plainly, your AI can't build it correctly, and neither could a new hire.

Then give that definition one home. Pick a single versioned source of truth and make every prompt build against it. The failure mode to kill is the one Amazon hit: an agent inferring intent from whichever document it found first. When there's one authoritative spec, there's nothing stale to grab.

Finally, point your AI tools at the spec, not at your scattered notes. The model is fine. The question is whether it's reading the truth you intended or the truth you forgot to update.


Define the product before you build it

Codalio exists for exactly the founder this story should scare: the one shipping fast with AI, working from context that's already drifting out of date. It turns your business logic into a versioned, spec-driven workflow, so your AI builds against an authoritative source of truth instead of guessing from whatever it found first.

Try Codalio before you start building, at codalio.com — you'll walk away with a spec your AI can actually build the right product from, not a prototype you'll be rebuilding after it costs you a customer.

References

  • Amazon engineers revolt over AI tool restrictions — TechRepublic

  • AI-assisted code changes cause major outages at Amazon — OECD.AI incident log

  • Governing AI agents: what the Amazon outage reveals — Wharton AI & Analytics