What you get in this FREE Newsletter
In Today’s 5-Minute AI Digest. You will get:
1. The MOST important AI News & research
2. AI Prompt of the week
3. AI Tool of the week
4. AI Tip of the week
…all in a FREE Weekly newsletter.
Sponsor:
Most AI agents demo well. Few ship real work.
Most AI agents can run a task. The problem is everything around it: setup, memory, context, cost, and figuring out what actually happened.
SureThing turns useful AI skills into autonomous agents with business context, persistent memory, cost-aware model selection, and a live dashboard. Paste a link, assign the work, and your agent reports back like a human teammate: what it did, what it cost, what needs your decision, and what happens next.
Built for founders, operators, and marketers who want AI to ship work, not become another tool to babysit.
iPrompt
THE AI NEWSLETTER THAT TURNS NEWS INTO ACTION
ISSUE #136 WEDNESDAY · 20 MAY 2026
THE HOOK
A Dell engineer left an AI agent running overnight. By morning it had eaten 1 billion tokens and a $3,400 cloud bill — for a task that should have cost $30. That slide is now in Dell’s keynote deck. It explains everything happening this week: Claude moving into QuickBooks, 1T-parameter agents shipping under your desk, Gemini opening I/O at the OS layer. The cloud-only era of agentic AI is ending. You’re watching it happen.
AI NEWS ROUNDUP
This week in AI
1 Anthropic shipped Claude for Small Business last Wednesday — a toggle install inside Claude Cowork that drops Claude into QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, and Microsoft 365. Fifteen prebuilt workflows: payroll planning, month-end close, invoice chasing, lead triage. No extra cost beyond a Claude licence. The 15-person HVAC company is now the target customer, not the Fortune 500. Anthropic →
2 Dell shipped agentic AI for under the desk. At Dell Technologies World yesterday, Dell launched Deskside Agentic AI — workstations running 30B-to-1T-parameter agents locally, no cloud API required. The economic claim is the whole story: break-even versus cloud in three months, 87% savings over two years, citing the $3,400 cloud bill from the Hook as the kind of workload these machines absorb. Boring infrastructure with a serious thesis. SiliconANGLE →
3 Google put agentic AI at the Android OS layer. A week ahead of today’s I/O keynote, Google unveiled Gemini Intelligence at its Android Show — an agentic system running across Android 17 at the OS level, not inside an assistant app. Demos: booking a spin class from a calendar notification with no user input, generating custom widgets from a natural-language prompt, multi-step tasks across third-party apps. Pichai takes the Shoreline stage at 10am PT today; we’ll cover what actually shipped on Friday. Android Authority →
4 Hyperscaler AI capex is past $725 billion in 2026. Goldman’s Q1 tally puts the four-hyperscaler total close to triple 2024’s $256B. The grid is now the bottleneck — a projected 45 GW US data-center power shortfall by 2028, and only one-third of 2026’s planned 12 GW under active construction. Microsoft signed a 2 GW nuclear deal. Northern Virginia has effectively halted new permits in several counties. Capital is committed. Megawatts are not. Fortune →
OUR ANGLE
🔭 The cloud-only model is breaking — and the migration just started
Read those four stories as one. Anthropic moved Claude into the local hardware store. Dell put 1-trillion-parameter agents under the desk. Google put Gemini at the Android OS layer. The macro story underneath — a $725B capex cycle running into a 45 GW power wall — explains why all three repositioned in the same week.
Here’s the chain. Agents don’t make one API call — they make hundreds, in loops, often unsupervised. Those loops produce unpredictable token burn, not steady draw. Unpredictable burn breaks the cloud-API cost model the way variable interest rates break a fixed mortgage. So the workload moves closer to where the data already lives — the SMB’s QuickBooks (Anthropic), the workstation under the desk (Dell), the phone’s OS (Google). And the new metric that decides who wins is cost-per-outcome, not benchmark score.
That’s why "privacy" is the marketing line and not the real driver. Privacy was the argument for local AI in 2024. It didn’t move anyone. The token bill in 2026 is moving everyone.
My bet — open to disagreement: by Q1 2027, more than half of agentic AI workloads in mid-market enterprises will run partially or fully on local or embedded substrates rather than pure cloud API. Narrower than "all enterprises," and only for workloads where loops dominate. If you think that’s wrong, reply and tell me where.
THE THREE SPECIALS
Do · Use · Understand
🎯 PROMPT OF THE WEEK
The Workflow Substrate Prompt
Last week’s Agent Brief was about what the agent does. This week’s prompt is about where it does it. With three valid substrates now in play — cloud API, local workstation, embedded-in-SaaS — picking wrong gets expensive fast. Paste a workflow description in, get a real recommendation back.
You are an AI infrastructure strategist. I’ll describe a workflow.
Tell me where it should run: cloud API, local workstation, or
embedded inside existing SaaS (e.g., Claude for Small Business,
Microsoft 365 Copilot).
For each, score 1-10 against these dimensions:
1. TOKEN VOLUME — does this loop heavily, or one-shot?
2. DATA SENSITIVITY — would leaving the building matter?
3. LATENCY TOLERANCE — fire-and-forget, or human watching?
4. CHANGE FREQUENCY — rewriting this every month?
5. WHO RUNS IT — me, ops, or a non-technical owner?
Give me:
- The winning substrate (1 line)
- Why it wins (3 bullets max)
- The break-even point in months vs the runner-up
- The one risk I’m not seeing
End with: "If your real constraint is X, reconsider Y."
WORKFLOW:
[paste workflow description here, 3-5 sentences]
Why it works: Most teams default to whatever stack their AI vendor sells. This forces a real comparison across the three substrates that now exist, scored on the dimensions that actually predict total cost — not benchmark scores, which are useless for this question. The 1-10 scoring also stops the model from giving you a both-sided "it depends" answer.
Where to be careful: Don’t trust the break-even math without sanity-checking the token volume assumption — models are confidently wrong about that number. If the answer is "local workstation," verify by running ten real samples through LM Studio on a laptop before buying hardware.
Works best on: Claude Opus 4.6, GPT-5.
🛠️ TOOL OF THE WEEK
LM Studio
Run AI models on your own laptop. No CLI, no setup, no API key.
★★★★ / 5
Use if: you’re weighing a local-hardware purchase and want to know whether open-weight models actually handle your workflow before you spend. Skip if: your workload is one-shot, low-token, and the privacy benefit isn’t material.
LM Studio is the easiest way to find out how far an open-weight model gets you. Download the app — Mac, Windows, Linux. Browse a catalogue (Llama 3.3, Qwen2.5, DeepSeek-R1, Mistral). Click download. Click load. Chat. About three minutes from install to first response.
Describe it to a colleague: "It’s ChatGPT but the model runs on my MacBook with the wifi off."
Polished GUI — no terminal, no Docker, no Python environment.
OpenAI-compatible local API server, so existing code mostly just works.
Best use case: before any local-hardware purchase, run a week of your real workflow through LM Studio on the laptop you already own. If a 30B open-weight model handles 70% of the task, the Dell pitch is real. If it handles 30%, stay on the cloud.
💡 TIP OF THE WEEK
Measure your agent’s burn rate before you scale it
Here’s the four-step measurement to run before any agent goes to production:
1. Run the workflow ten times on representative inputs with a token meter open.
2. Calculate tokens per outcome — not tokens per call. Outcomes are what you’re paying for.
3. Multiply by your expected real-world volume, then add 40%. Agents always consume more in the wild than in testing.
4. If that number scares you, look at LM Studio or an embedded SaaS workflow before you commit. Don’t scale the agent.
Why the 40% buffer is there: last week I ran a research agent across 40 documents using Claude Opus through the API — the kind of thing I’d happily let run overnight. It burned £180 before I noticed the loop wasn’t terminating cleanly. The bug was in my prompt, not the model. That’s the kind of thing the 40% catches.
Where it doesn’t apply: one-shot chat use. If your team uses AI like a search engine, none of this matters. Burn rate only becomes a problem once you’re running multi-step loops, scheduled jobs, or anything an agent triggers without a human watching.
YOUR MOVE
Pick one. Reply by Friday.
You just learned:
— Claude for Small Business put Claude inside QuickBooks, HubSpot, and the rest — the 15-person business is the new battleground.
— Dell shipped 1T-parameter agents that run under your desk — the cloud-only era of agentic AI is over.
— Burn rate, not benchmarks, is the number that decides where your workflow should run.
Pick one of these three and do it before Friday. Run the Substrate Prompt on a workflow you’re about to scale, install LM Studio and test one open-weight model on it, or measure burn rate on one agent loop you’re already running in production.
Then reply and tell me which one. That’s the action that matters this week — not the share, not the deep dive. Reply. I read every response, and the readers who actually move tend to be the names I start to recognise.
R. Lauritsen
EDITOR · iPROMPT

