The Week the Model Race Became a Platform War

In partnership with

Your Boss Will Think You’re an Ecom Genius

You’re optimizing for growth. You need ecom tactics that actually work. Not mushy strategies.

Go-to-Millions is the ecommerce growth newsletter from Ari Murray, packed with tactical insights, smart creative, and marketing that drives revenue.

Every issue is built for operators: clear, punchy, and grounded in what’s working, from product strategy to paid media to conversion lifts.

Subscribe free and get your next growth unlock delivered weekly.

The Week the Model Race Became a Platform War

Opus 4.6, GPT-5.3-Codex, a $285B stock selloff, and what it means for your AI stack

February 5, 2026 was the most consequential day in AI this year. Anthropic and OpenAI launched flagship models within minutes of each other. Software stocks cratered. A coding model earned the first “High” cybersecurity risk rating. And beneath the headlines, two fundamentally different visions for the future of knowledge work came into focus. This deep dive breaks down what happened, what it means, and what to do about it

TL;DR

• Opus 4.6 and GPT-5.3-Codex launched within minutes. Benchmarks nearly tied. The performance gap is closing, not widening.

• The real war moved downstream: Anthropic embeds Claude into Office apps. OpenAI builds Frontier for AI agent management. Both want to own the knowledge work OS.

• Cowork plugins triggered a $285B selloff. The market is pricing AI as a SaaS substitute—not a complement.

• GPT-5.3-Codex is the first model rated “High” for cybersecurity risk. It chains exploits end-to-end. API access delayed.

• We were wrong on Sonnet 5 timing. Anthropic prioritized enterprise platform over midtier model refresh.

By the Numbers Metric	Detail
Software stock losses (Feb 3)	$285B wiped in one session
Thomson Reuters	-15.83% (largest single-day drop on record)
LegalZoom	-19.68%
S&P Software Index YTD	Down ~20%, 8-session losing streak
Opus 4.6 vs GPT-5.3 (Terminal-Bench)	65.4% vs 64.7% — 0.7pt gap
Opus 4.6 context window	1M tokens (beta) — 4x previous Opus
Claude Code revenue run rate	$1B (6 months after launch)
Anthropic valuation	$350B (pending $10B round)

The Selloff: Panic or Repricing?

The trigger wasn’t Opus 4.6. It was the Cowork plugins that shipped three days earlier. Anthropic released industry-specific workflows for legal, finance, and marketing that didn’t generate text—they executed complete processes. Contract review. NDA triage. Compliance checks. End-to-end, minimal human oversight.

The market’s calculation: if a $200/month AI subscription handles even 30% of what a $50,000/year Westlaw license does, the pricing power of every professional SaaS company is in question. Not because AI replaces them tomorrow—but because it commoditizes the easy 30% today, and the hard 70% is next year’s problem.

❝

“Why do I need to pay for software if internal development now takes developers less time with AI? With Claude Cowork, fewer technical users are empowered to replace existing workflows.”

— Thomas Shipp, Head of Equity Research, LPL Financial

The counterargument: Wedbush Securities called the selloff an “Armageddon scenario far from reality.” Enterprises won’t overhaul trillions of embedded data points overnight. That’s true. But the SaaS model runs on pricing power, and pricing power runs on irreplaceability. Every workflow Cowork automates chips away at that moat—whether the full transition takes two years or ten.

The Dueling Launches: When Benchmarks Stop Mattering

February 5: Anthropic shipped Opus 4.6 at ~9:51 AM PST. OpenAI dropped GPT-5.3-Codex within minutes. Both claim state-of-the-art. Both are right—on different tests. And that’s precisely why the tests are becoming irrelevant.

Benchmark	Opus 4.6	GPT-5.3-Codex	What It Measures
Terminal-Bench 2.0	65.4%	64.7%	Agentic coding
SWE-Bench Pro	—	SOTA	Multi-language SE
GDPval-AA	1606 Elo	1462 Elo	Knowledge work (finance, legal)
ARC AGI 2	68.8%	54.2% (Pro)	Novel problem-solving
BigLaw Bench	90.2%	—	Legal reasoning
Context Window	1M (beta)	400K	Max processable input

The convergence is the story. Terminal-Bench gap: 0.7 points. Opus 4.5 still edges Opus 4.6 on SWE-Bench Verified (80.9% vs 80.8%)—benchmark saturation at the frontier. When models trade fractions of a point, the differentiator stops being the model and starts being everything around it: platform, integrations, pricing, ecosystem.

❝

“Claude Opus 4.6 excels on the hardest problems. It shows greater persistence, stronger code review, and the ability to stay on long tasks where other models tend to give up.”

— Michael Truell, Co-founder, Cursor

Two Competing Visions for AI at Work

Model capabilities are converging. Platform strategies are diverging. Understanding the difference is what separates smart stack decisions from expensive ones.

Anthropic: Disappear Into Your Workflow

Claude in Excel. Claude in PowerPoint. Cowork plugins. Agent teams in Claude Code. The strategy is consistent: don’t make people come to Claude—put Claude where people already are. Anthropic isn’t building a new workplace. It’s augmenting the existing one.

❝

“We noticed a lot of people who are not professional software developers using Claude Code simply because it was a really amazing engine to do tasks.”

— Scott White, Head of Product, Anthropic

What this means for you: lower switching costs today. Your team doesn’t learn a new platform—they get a new capability inside familiar tools. The risk is subtle: Anthropic becomes your invisible infrastructure provider, and you don’t realize the dependency until you try to leave.

OpenAI: Build the New Workplace

Frontier. Codex desktop app. @codex in GitHub PRs and Slack. GPT-5.3-Codex’s mid-task steering. OpenAI is constructing a new interface for managing AI workers as colleagues—assign tasks, check in, redirect, review output. It’s building the org chart of the future with AI agents as direct reports.

What this means for you: higher commitment, higher potential leverage. If you go all-in on Frontier, your AI workflows are deeply integrated and powerful. The risk: switching costs compound every month. You’re building on their platform, not yours

❝

“It no longer feels like a tool. It feels like a truly capable collaborator.”

— Sarah Sachs, Head of AI, Notion

The bottom line: Anthropic bets you’ll stay because Claude is woven into everything. OpenAI bets you’ll stay because you’ve reorganized your work around their platform. Both are right. Choose based on your organization’s tolerance for platform dependency—and how quickly you’d need to unwind if one of them changes pricing, policies, or direction.

Three Features That Change Your Work (and One That Changes the Industry)

Agent Teams: From Single-Thread to Swarm

Multiple Claude Code instances coordinating via shared task list, each with its own context window. One Anthropic engineer used 16 agents to build a 100,000-line C compiler over 2,000 sessions. But the real insight isn’t the scale—it’s the principle: LLMs perform worse as context expands. By giving each agent narrow scope and clean context, you get better reasoning within each domain. The security reviewer doesn’t wade through performance notes. Same reason human teams specialize.

Cost reality: 5x+ token consumption. Experimental flag required (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1). Use for high-value parallel work: code review, cross-layer features, competing-hypothesis debugging. Don’t use for sequential tasks or same-file edits.

Adaptive Thinking: The Economics Feature Disguised as a Model Feature

Four effort levels (low, medium, high, max) replace manual extended thinking. Low effort on simple tasks cuts cost and latency by up to 60%. Max delivers the absolute highest capability. The migration matters: budget_tokens and thinking: {type: “enabled”} are deprecated on Opus 4.6. Assistant message prefilling returns 400 errors. Migrate before switching model IDs.

Claude in PowerPoint: The Office Trojan Horse

Available in research preview for Max, Team, Enterprise. Claude reads slide masters, layouts, fonts, and colors, then generates or edits slides that stay on-brand. The feature itself is useful. The strategy it represents is transformative: Anthropic is using Office integrations to embed Claude into the daily rhythm of knowledge work—the place where SaaS disruption actually happens.

The Cybersecurity Threshold: GPT-5.3-Codex Crosses a Line

The first model OpenAI classifies “High capability” for cybersecurity. On their Cyber Range, it solved nearly every scenario—including binary exploitation where the model independently discovered an attack path, reverse-engineered a binary, and executed the exploit without being told to look. OpenAI delayed API access and committed $10M in defense research credits.

What to do now: if your team uses AI coding assistants, audit what code and infrastructure they have access to. The attack surface isn’t hypothetical anymore—it’s benchmarked. Expect “High capability” cybersecurity ratings to become standard safety disclosures for every frontier release.

About That Sonnet 5 Prediction

We predicted Sonnet 5 would launch alongside Opus 4.6. We were wrong. Anthropic focused on enterprise platform features—Office integrations, Cowork plugins, agent teams—over a mid-tier model refresh.

From early leaks: codename “Fennec,” reportedly 50% cheaper than Opus 4.5, 80.9% SWE-Bench, 1M context. The sequencing is strategic: ship the platform story with Opus 4.6, then follow with the cost-effective workhorse once the ecosystem is established. Lesson: Anthropic’s release cadence is now driven by platform strategy, not model capability.

Your Action Checklist

Test agent teams on a real code review. Enable the experimental flag. Spin up three reviewers (security, performance, tests). Compare to single-session output. The quality gap on complex PRs is significant.
Switch to adaptive thinking. Replace thinking: {type: “enabled”} with {type: “adaptive”}. Route effort by task complexity for immediate cost savings. One team reported 40% API cost reduction.
Audit your AI tools’ access scope. GPT-5.3-Codex’s cybersecurity rating means AI-assisted exploits are no longer theoretical. Review what code, secrets, and infrastructure your AI coding assistants can reach. Restrict by default.
Try Claude in PowerPoint. Install from Microsoft Marketplace. Test against your corporate template. If it handles your brand guidelines, the time savings on deck-building are substantial.
Prepare for Sonnet 5. At 50% of Opus 4.5 price with comparable performance, it’ll become most teams’ default model. Plan your migration path now.

Go Deeper

Opus 4.6 blog: anthropic.com/news/claude-opus-4-6

GPT-5.3-Codex: openai.com/index/introducing-gpt-5-3-codex

Agent teams docs: code.claude.com/docs/en/agent-teams

Claude in PowerPoint: claude.com/claude-in-powerpoint

GPT-5.3-Codex system card: openai.com/index/gpt-5-3-codex-system-card

Agent teams deep walkthrough: addyosmani.com/blog/claude-code-agent-teams

The model race gave us better AI. The platform war will decide who controls how we use it. The choices you make in the next 90 days—which tools you embed, which workflows you automate, which vendor you bet on—will compound long after this week’s benchmarks are forgotten. Choose deliberately.

This deep dive accompanies the iPrompt newsletter, Week of February 10, 2026.

Stay curious—and stay paranoid.

— R. Lauritsen

Subscribe for FREE

The Week the Model Race Became a Platform War

Your Boss Will Think You’re an Ecom Genius

The Week the Model Race Became a Platform War

Three Features That Change Your Work (and One That Changes the Industry)

Your Action Checklist

Go Deeper

Recommended for you

Quick Links

Subscription

Socials