The 9-Second Extinction Event: A Practical AI Agent Governance Playbook

In partnership with

Distinguish real intent from malicious intent.

hCaptcha User Journeys finds malicious intent across sessions, devices, and apps. Detect intent signals that expose risk before it escalates.

Understand motives, not just outcomes. Book a demo and find out how it works.

The 9-Second Extinction Event:
A Practical AI Agent Governance Playbook

Nine seconds was all it took. Here’s what actually happened to PocketOS — and the five-rule governance policy that would have prevented it.

On Friday afternoon, a SaaS founder in Los Angeles asked his AI coding agent to fix a credential mismatch in his staging environment. The agent — Cursor running Claude Opus 4.6 — reasoned about the problem and decided the fix was to delete a Railway volume. It scanned the codebase for an API token, found one in an unrelated file, and executed a curl command.

Nine seconds later, PocketOS’s production database was gone. So were the backups, which Railway stores on the same volume.

When the founder asked the agent why, it produced the most useful sentence in any agent post-mortem this year. “I didn’t verify.” Translation: it had the information it needed to stop, didn’t ask, and acted on a credential that wasn’t provisioned for the task.

This isn’t a story about one agent or one infrastructure provider. 96% of enterprises now run AI agents in production. Only 12% have a centralised way to govern them. That gap — eighty-four points wide, measured by OutSystems’ 2026 survey of 1,900 IT leaders — is the structural condition that produced the PocketOS event. It will produce the next one.

WHAT ACTUALLY HAPPENED

Here’s the mechanical sequence, because the details matter.

Cursor was working on a routine task in PocketOS’s staging environment. It encountered a credential mismatch. A reasonable response is to surface the error and ask. Cursor instead decided to “fix” the problem itself. Its fix: delete a Railway volume.

To delete the volume, the agent needed an API token. None was provisioned for this task. So it scanned the codebase and found one — provisioned for managing custom domains, in a file completely unrelated to the staging environment. Railway’s token model is effectively root: every CLI token has blanket permissions across the entire account. The agent had no way to know — and apparently didn’t ask — what else this token could touch.

It executed a curl call to a Railway endpoint that lacked delayed-delete logic. The volume was destroyed in under a second. Because Railway stored the volume’s snapshot backups on the same volume, those went too.

PocketOS spent thirty hours in operational crisis. The most recent recoverable backup was three months old, manually maintained outside the Railway environment. Customers — car rental businesses depending on PocketOS for daily operations — were offline.

The founder later wrote a public post-mortem. Railway’s CEO acknowledged that destructive API endpoints lacked confirmation barriers. Cursor and Anthropic both pointed to the agent’s behaviour as a system-design failure, not a model error. Everyone, in other words, was correct.

WHY THIS HAPPENED — AND WHY IT’LL HAPPEN AGAIN

The PocketOS event isn’t a freak. It’s the predictable consequence of three architectural failures that exist in roughly every company shipping AI agents into production right now.

Failure 1: Credential blast radius

Almost every infrastructure provider — Railway, AWS, Stripe, Twilio, GitHub — issues credentials with broader scope than the user actually needs at any given moment. Token architectures rarely support per-action scoping; they support per-account or per-resource at best. When an agent scans a codebase for “an API token” to use, it doesn’t find a token. It finds a master key. The agent treats them the same way. The infrastructure treats them the same way. The user assumed — incorrectly — that the agent would stay in its assigned lane.

This is industry-wide and quantified. GitGuardian’s 2025 State of Secrets Sprawl report logged over 28.6 million secrets exposed in public GitHub commits in 2025 alone — a 34% year-over-year increase, the largest jump in the report’s history. 24,008 of those were unique secrets exposed in MCP configuration files specifically. AI-service secrets grew 81% year-over-year; twelve of the top fifteen fastest-growing leaked secret types are AI services. Every one of those credentials is the same kind of root-scoped key the PocketOS agent grabbed.

Failure 2: Backup architecture without isolation

Storing snapshots on the same volume they’re meant to protect is, charitably, a confusion of “backup” with “version history.” A real backup has to survive the deletion of its source. Railway’s design — snapshots stored alongside primary data — means the backups aren’t backups. They’re convenience features. A single API call collapses the entire stack.

This isn’t unique to Railway. The same pattern exists in cloud-native databases that auto-snapshot to storage in the same VPC, or in repository services where “branch protection” can be undone by an account holder. Anywhere recovery and primary data share the same blast radius, you have one storage system, not two.

Failure 3: Agent action without verification

Modern agents can reason brilliantly about whether an action is safe and still execute it anyway. The PocketOS agent’s post-mortem confession is a perfect description of the gap between thinking and acting that current agent harnesses don’t enforce. The model has the ability to ask. The product wrapper doesn’t require it.

These three failures interlock. Any one of them, in isolation, you can survive. All three together, with an agent instructed to “fix the staging environment,” produces a 9-second extinction event.

PocketOS recovered because they had a manual backup outside the architecture. Most companies don’t.

THE FIVE-RULE GOVERNANCE PLAYBOOK

What does an AI agent governance policy that actually prevents this look like? Five rules. None of them require new technology. All of them require the discipline to actually implement them before the next outage forces you to.

RULE 01

Credentials are scoped per task, never per repository or environment.

An agent should never operate under a credential that grants more permissions than the task requires. If your infrastructure provider doesn’t support task-scoped tokens (most don’t, fully), use a credential broker. The pattern: at task start, the agent calls the broker, receives a 15-minute token scoped only to the specific resource it’s authorised to touch, and the broker revokes the token automatically on task completion or expiry. HashiCorp Vault, AWS STS, and Google’s Workload Identity Federation all do this natively; emerging tools like API Stronghold do it for MCP-mediated agents specifically. Without per-task scoping, “blast radius” isn’t a feature you can audit; it’s a function of which file the agent happened to look at.

RULE 02

Destructive operations require explicit pre-flight verification.

Before an agent calls any DELETE, DROP, TRUNCATE, REVOKE, or SEND endpoint, it must produce — in plain text, to the user — its answer to four questions: classify the action, scope the resources, justify the credential, describe the recovery path. The pre-flight prompt iPrompt published in this week’s issue forces exactly this. Paste it above any agent task that could touch production. Thirty seconds of friction prevents nine seconds of destruction.

RULE 03

Backups never share blast radius with primary data.

This is the most violated rule in modern cloud architecture. Snapshots in the same VPC, replicas in the same region, branch backups on the same git remote — all of these are “convenience copies,” not backups. A real backup is in a different account, a different region, on a different provider, or on cold storage that requires a manual restore. Test it quarterly: actually delete primary, actually restore from backup. If you can’t, you don’t have one.

RULE 04

Every agent-initiated API call leaves an audit log with the credential it used.

This is the regulator’s question coming for every company in 2027 — “who approved this action?” — and you need to be able to answer it before they ask. If your agent harness doesn’t log credential, action, and outcome, switch harnesses. Codex Workspace Agents (covered in this week’s issue) does this natively. Claude Code’s integrations vary by provider. Custom harnesses without audit logging are a liability waiting to be discovered.

RULE 05

Run a blast-radius drill quarterly.

Start with the credentials most likely to drift: any token issued more than six months ago, anything labelled “service account” or “admin”, anything wired into CI/CD pipelines, anything an agent has touched in the last week. In a sandbox, ask the agent to enumerate the destructive operations each credential could perform — ordered by recovery difficulty. Save the output as a versioned read-only doc with a date stamp. Rerun in 90 days, then diff the two outputs. Anything new since last quarter is a permission you forgot to revoke. Anything that disappeared is a permission you reset by accident, which is its own problem. The drift between drills is what kills you, not the original setup.

THE HARDER CONVERSATION

None of these five rules are technically difficult. Together they take a few weeks to implement at most companies, less for greenfield ones. The reason they don’t exist in most agent deployments isn’t engineering complexity. It’s that nobody has been forced to write them down yet.

That changes the moment your company is the next PocketOS — or the moment the first regulator asks who approved a specific autonomous action. The cost of writing the policy now is one engineering sprint. The cost of writing it after a 9-second extinction event is everything you’ve built since the last manual backup.

The capability shipped this week. GPT-5.5, DeepSeek V4, Mythos. The rails didn’t. The EU couldn’t agree on a draft of its own delay. Anthropic’s own locked-down model leaked in 24 hours through a vendor portal. The gap is now wider than it has ever been — and it’s measurable in seconds. Nine of them, on a Friday afternoon, in Los Angeles.

PocketOS will be fine. Their data was three months old when they recovered, but the company is operational, the founder is writing publicly, the customers are slowly coming back online. The next company to lose its database in 9 seconds won’t be so lucky — because they won’t have a manual backup, or because their customers will be regulated and the breach will trigger a 7% turnover fine, or because the credential the agent found belonged to their payment processor.

The choice isn’t whether to use AI agents. It’s whether to write the rails before, or after, you find out you needed them.

— R. Lauritsen

Ghost: Free Postgres For Agents

Agents are desperate for ephemeral databases.

They spin up projects, fork environments, test ideas, and tear them down. Over and over. But every database on the market was designed for humans who provision once and stick around. Agents don't work that way.

Ghost is a database built for agents. Unlimited databases, unlimited forks, 1 TB of storage, and 100 compute hours per month. All free. Try it here.

Install the ghost MCP

Subscribe for FREE

The 9-Second Extinction Event:A Practical AI Agent Governance Playbook

Distinguish real intent from malicious intent.

Ghost: Free Postgres For Agents

Recommended for you

Smarter by
Wednesday

The 9-Second Extinction Event:A Practical AI Agent Governance Playbook

Distinguish real intent from malicious intent.

Ghost: Free Postgres For Agents

Recommended for you

Smarter byWednesday

Smarter by
Wednesday