If you work in fintech or finance, you already have too many tabs open and not enough time.
Fintech Takes is the free newsletter senior leaders actually read. Each week, I break down the trends, deals, and regulatory moves shaping the industry — and explain why they matter — in plain English.
No filler, no PR spin, and no “insights” you already saw on LinkedIn eight times this week. Just clear analysis and the occasional bad joke to make it go down easier.
Get context you can actually use. Subscribe free and see what’s coming before everyone else.
Capability Is Outrunning Economics.
That’s Not a Bug—It’s the New Business Model.
The compute math behind Sora’s death, what Anthropic’s leaked “Capybara” tier means for your budget, and three companies betting that efficient inference—not raw power—wins the next era.
April 2, 2026 • By R. Lauritsen • ~2,200 words • 10 min read
TL;DR • Sora burned ~$1M/day in compute and peaked at under 1M users. OpenAI pulled the plug after six months. • Anthropic’s leaked Mythos model introduces a “Capybara” tier above Opus—more capable and far more expensive to run. • Jensen Huang’s AGI declaration conveniently reframes “infinite compute demand” as “mission accomplished.” • The pattern: capability has outrun what the market will pay. Expect 5–10x pricing tiers within six months. • The winners won’t build the biggest models. They’ll make big models cheap enough to deploy. |
The Week That Exposed the Economics
I’ve been writing about AI costs for two years now, and this was the week where the numbers finally stopped being abstract. Everything landed at once.
OpenAI killed Sora on March 24. Per the Wall Street Journal, the product was hemorrhaging roughly a million dollars a day in GPU costs while its user base collapsed from a peak of one million to fewer than half that. Disney, which had committed a billion dollars to a Sora partnership, found out the deal was dead less than an hour before the public announcement. Six months from launch to shutdown.
Three days later, a CMS misconfiguration at Anthropic dumped close to 3,000 unpublished assets into a publicly searchable data store. Among them: a draft blog post describing Claude Mythos, a model in a new “Capybara” tier above Opus. Anthropic confirmed it’s their most capable model ever built. They also confirmed it’s extremely expensive to serve.
Sandwiched between those two events, Jensen Huang went on the Lex Fridman podcast and declared AGI already achieved—defining it, conveniently, as AI that could generate a billion dollars in value, even briefly. Google DeepMind published a cognitive taxonomy the same week setting a much higher bar.
Each of these stories got covered individually. What I haven’t seen anyone do is read them as one story. So here it is: all three are symptoms of the same underlying problem. The AI labs have gotten scary-good at building capability. They haven’t figured out how to make it affordable.
By the Numbers
Metric | Figure |
Sora daily compute cost | ~$1 million |
Sora peak users | ~1 million (collapsed to <500K) |
Sora total attributable revenue | ~$2.1 million |
Disney partnership value (dead) | $1 billion |
Sora lifespan (launch to shutdown) | ~6 months |
Anthropic leaked assets from CMS error | ~3,000 unpublished files |
Big Tech AI infra spend (2026, combined) | ~$700 billion |
Share going to inference, not training | ~67% (Microsoft Q2 fiscal 2026) |
Nvidia share of AI training hardware | ~80% |
GPT-4 cost-per-token reduction since launch | ~100x |
The Compute Math Behind Sora’s Death
The numbers tell a brutal story. Sora was burning approximately $365 million a year in compute against a user base that never cracked a million people. Even if every single user paid for ChatGPT Pro at $200/month, that’s $2.4 billion a year in theoretical maximum revenue—except Sora wasn’t a standalone subscription. It was bundled into existing plans. The actual revenue attributable to Sora was, by multiple analysts’ estimates, roughly $2.1 million total.
Let that ratio sit for a second. $365 million in annualized compute. $2.1 million in revenue. That’s not a business model problem you can iterate your way out of. That’s a physics problem. Video generation at Sora’s quality level requires so much GPU time per minute of output that no commercially viable price point could close the gap.
And OpenAI knew. Their own team used the phrase “economically irreconcilable” to describe Sora’s cost structure. Altman made the call: kill it, free up the GPUs, point them at the products that actually generate revenue. Which is ChatGPT and the API. Everything else got triaged.
“Sora was a money pit that nobody was using, and keeping it alive was costing OpenAI the AI race.” — TechCrunch, citing Wall Street Journal investigation
Here’s why this matters beyond OpenAI: if the best-funded AI company on the planet—$110 billion in fresh capital, the most recognized brand in the space—couldn’t make frontier video generation work economically, the cost problem isn’t going away on its own. It’s structural.
What Mythos’s “Capybara” Tier Tells Us About Pricing
The leaked Anthropic draft was candid in a way that polished press releases never are. Mythos sits in a new tier called Capybara—above Opus, which was already their most expensive offering. The draft states plainly that the model is “very expensive for us to serve, and will be very expensive for our customers to use.”
That’s Anthropic telling us—by accident, but in their own words—that the next leap in capability comes with a price tag that doesn’t fit any current consumer subscription. When they say they’re working on making it “much more efficient before any general release,” the translation is simple: the model works, but they can’t afford to let most people touch it yet.
What does this mean in dollars? If Capybara launches at 5–10x current Opus API pricing, you’re looking at $75–$150 per million input tokens for the most capable model on the market. That prices out individual developers, most startups, and plenty of mid-size companies. Enterprise-only access stops being a strategic choice and starts being the only math that works.
Anthropic isn’t the only one feeling this. OpenAI’s GPT-5.4 Pro variant is already the most expensive model in the common rankings without a clear performance lead to justify the premium. The cost curve is bending the wrong way for anyone who isn’t an enterprise buyer.
Huang’s AGI Declaration: Follow the Money
Jensen Huang didn’t just say AGI is here. He said it during a week when Nvidia’s entire narrative depends on AI labs needing more compute, not less. When your revenue comes from selling the fuel, you have every incentive to declare the engine is running perfectly.
His definition was revealing: AI that can build a billion-dollar company, even briefly. That’s not the standard research definition, which involves human-level performance across all cognitive domains. It’s a definition that maps neatly onto the current state of his customers’ products. Google DeepMind published a 10-faculty cognitive taxonomy the same week that draws the line much higher: current models are strong in language and narrow reasoning but weak in perception, social cognition, long-term memory, and sustained planning.
Why should you care? Because the AGI narrative drives budgets. If CEOs believe AGI is imminent, they approve bigger GPU orders. Nvidia’s GTC 2026 was dominated by enterprise agentic deployments—production workloads that eat compute on an ongoing basis. Huang wasn’t making a scientific claim. He was making a business case dressed in academic language.
Bull Case vs. Bear Case
🟢 The Bull Case: Efficiency Always Catches Up
History is on this side. GPT-4’s effective cost per token has fallen roughly 100x since launch through distillation, quantization, and hardware improvements. Mistral Small 4 now runs on a single A100—frontier-class performance on hardware you can rent for a few bucks an hour. The labs are spending hundreds of billions on inference infrastructure precisely because they expect unit costs to plummet. Models like Mythos may be expensive today, but so was every frontier model at launch.
If this view is right, the current pricing pressure is a phase, not a destination. The companies investing most aggressively in inference efficiency—Amazon’s Trainium, Google’s TPUs, Cerebras, Groq—will crack the cost curve within 12–18 months. Today’s enterprise-only models become tomorrow’s commodity tier.
🔴 The Bear Case: This Time the Gap Is Structural
But here’s what nags at me. Previous efficiency gains came from making existing architectures cheaper to run. Mythos represents something different: a step up in raw model size. Anthropic describes Capybara as “larger and more intelligent than Opus.” If every capability tier demands proportionally more compute, the efficiency treadmill never catches up. You’re running faster, but the finish line keeps moving.
Sora is the canary. Video generation is the hungriest modality, and it failed the economics test first. But multimodal reasoning, agentic loops that call models hundreds of times per task, real-time interactive AI—they all have similar cost shapes. If the labs keep building bigger without a breakthrough in how we serve these models, we drift toward a world where the best AI is permanently gated by who can write the biggest check.
Where I come down: somewhere in the middle, leaning bear for the next 12 months. Efficiency will improve—it always does. But not fast enough to keep pace with the capability jumps coming. The pricing split is real and it’s happening. Plan for it.
Three Companies Positioned for the Efficiency Era
1. Cerebras Systems
If you want one number that explains Cerebras, it’s 6x. That’s how much faster their CS-3 delivers inference compared to Groq’s LPU on equivalent models, per independent benchmarks from Artificial Analysis. The company builds wafer-scale chips—single pieces of silicon the size of a dinner plate, packed with memory right next to compute. No bottleneck shuttling data across chips.
The validation has come fast: a $10 billion deal with OpenAI, a partnership with AWS, and an IPO filing pending. CEO Andrew Feldman’s pitch is simple—the market is fragmenting away from GPU-only infrastructure, and whoever serves tokens fastest and cheapest captures the next wave. I’m watching this one closely.
2. Groq
Groq has been the inference speed benchmark for two years, and Nvidia just paid the ultimate compliment: a licensing deal worth roughly $20 billion at GTC 2026 to pair Groq’s LPUs with their own GPUs. Think about that. Nvidia bought access to a competitor’s architecture because GPUs alone weren’t fast enough for what the market is demanding.
The other number that matters: Groq chips consume roughly a third of the power of equivalent GPUs. For inference workloads—which is where most of the $700 billion in AI spending is actually going—that power efficiency turns directly into margin. They’re expanding into Europe with a Helsinki data center and running GroqCloud as inference-as-a-service.
3. Mistral (the open-source hedge)
This is the play for people who read this deep dive and think: I need to stop being dependent on pricing decisions made in San Francisco boardrooms.
Mistral Small 4 is 22 billion parameters, Apache 2.0 licensed, and runs on a single A100. It outperforms several closed models three to five times its size. No API costs. No vendor lock-in. No surprise pricing tier changes. For the exact scenario this article is about—escalating costs at the frontier—Mistral is the hedge. Capable enough for most real-world workflows, free enough to deploy at any scale you want. Europe’s strongest answer to the American lab pricing squeeze.
Predictions (Time-Stamped)
• By September 2026: At least two major labs (likely Anthropic and OpenAI) introduce pricing tiers 5–10x above current flagship rates for their most capable models.
• By December 2026: “Enterprise-only” becomes a standard launch strategy for frontier models. Consumer access comes months later, if at all.
• By March 2027: At least one more compute-heavy product (likely a real-time interactive agent or video tool) gets killed or restructured for the same reasons as Sora.
• By mid-2027: Custom inference silicon from Cerebras, Groq, or a similar player captures 10%+ of new enterprise inference deployments, up from under 3% today.
What to Do About It: Role-Specific Checklist
If you’re a business leader:
• Audit your AI vendor contracts for pricing escalation clauses. If there’s no cap on rate increases, you’re exposed.
• Stress-test your AI budget against a 3x cost increase for your primary model. Find out what breaks before it breaks.
• Start evaluating open-source alternatives (Mistral, Llama) for non-critical workflows. Do it now, while it’s a choice—not after it’s a scramble.
If you’re a developer:
• Build model-agnostic. Abstract your LLM calls behind an interface so you can swap providers without rewriting the app.
• Benchmark Cerebras and Groq for your actual workloads. The speed and cost gaps aren’t incremental—they’re multiples.
• Prototype with Mistral Small 4 or Llama for anything that doesn’t truly need frontier reasoning. Honest answer: most tasks don’t.
If you’re an AI-curious professional (our core reader):
• Don’t lock into annual AI subscriptions right now. Month-to-month until the pricing dust settles.
• Get comfortable with more than one AI tool. People who depend on a single provider are the ones who get blindsided when pricing shifts.
• Watch inference costs, not benchmark scores. The model that’s cheapest to run at “good enough” quality will matter more than whatever tops the leaderboard this quarter.
Go Deeper
• TechCrunch: “Why OpenAI Really Shut Down Sora” — The WSJ-sourced investigation into Sora’s economics. The Disney detail alone is worth your time.
• Fortune: “Anthropic’s Mythos Model Leak” — The original reporting on the CMS error. Includes excerpts from the draft blog post and Anthropic’s response.
• Fortune: “No One Can Agree on What AGI Means” — Huang’s claim vs. DeepMind’s cognitive taxonomy. Best piece I’ve read on why the definition war actually matters.
• SDxCentral: “Cerebras Spins Nvidia’s Groq Tie-Up” — The inference hardware race from the challengers’ perspective. Feldman’s “size of the straw” metaphor is one you’ll remember.
• Digital Applied: “March 2026 AI Roundup” — Everything that happened this month in one place. Useful if you want the full picture beyond the economics angle.
This deep dive accompanies iPrompt Issue #130. The newsletter has the quick hits, this week’s prompt, the tool pick, and a tip that’ll change how you write prompts.
— R. Lauritsen
Stay curious—and stay paranoid.


