Frenemies: Elon hands Anthropic 220k GPUs

Industry Hot Take

The Frenemy Trade: Anthropic Leases Elon's Colossus 1

Anthropic announced at "Code with Claude" that it has secured all available capacity at xAI's Colossus 1 supercomputer in Memphis — over 300 megawatts and 220,000+ Nvidia GPUs^{[1]Tech Brew — Frenemies with benefits} — to claw out of an 80×-revenue-growth compute crunch that had been throttling Claude Code all spring.^{[5]AI Daily Brief — Surprise Elon-Anthropic Team Up} Theo's read: this is the "enemy of my enemy has compute" deal — Anthropic banned xAI from its API in January and Elon has called Anthropic "misanthropic" repeatedly, but xAI's barely-used Colossus 1 was idle while Anthropic was outage-prone, so morality moved to the back burner.^{[3]Theo - t3.gg — Anthropic just...wait what} Simon Willison flags that the Colossus facility runs gas turbines without Clean Air Act permits and has been linked to elevated local hospital admissions.^{[2]Simon Willison — Notes on the xAI/Anthropic data center deal}

Deal mechanics

Tech Brew reports Anthropic is taking ~300 MW and 220,000+ Nvidia GPUs at Colossus 1, with capacity coming online within a month.^{[1]Tech Brew — Frenemies with benefits} The AI Daily Brief adds the breakdown: xAI had already migrated its own training to Colossus 2 (Blackwell, ~550k GPUs), making Colossus 1 (mostly H100s) available to lease.^{[5]AI Daily Brief — Surprise Elon-Anthropic Team Up} Elon also announced xAI will be folded into "SpaceX AI." Theo's number on Colossus 1: Anthropic gets 300 of its 425 MW and 220,000 of its 280,000 H100s — basically the entire cluster.^{[3]Theo - t3.gg — Anthropic just...wait what} Musk reserves the right to reclaim compute "if Anthropic's AI causes harm," with himself as arbiter — Simon Willison flags that as a notable governance asterisk.^{[2]Simon Willison — Notes on the xAI/Anthropic data center deal}

Why Anthropic was desperate

Dario Amodei told the Code with Claude audience: "We planned for a world of 10× growth per year. In the first quarter of this year, we saw 80× annualized growth per year in revenue and usage."^{[5]AI Daily Brief — Surprise Elon-Anthropic Team Up} Theo's interpretation: every recent Anthropic move that looked monetization-driven — Claude Code being yanked from Pro, peak-hour throttles — was actually emergency compute rationing, not pricing strategy.^{[3]Theo - t3.gg — Anthropic just...wait what} The Colossus 1 deal sits alongside a $50B Fluidstack agreement, plus multi-gigawatt deals with Amazon (5 GW), Google + Broadcom (5 GW), and Microsoft + Nvidia (1 GW).^{[1]Tech Brew — Frenemies with benefits} Nate Herk also notes a forward-looking clause: Anthropic and SpaceX expressed interest in "multiple gigawatts of orbital AI compute capacity" — GPUs in space, on the theory that terrestrial compute has long-term physical and community ceilings.^{[4]Nate Herk — Claude Just Solved Session Limits}

Environmental fine print

Simon Willison highlights the Colossus facility's documented track record: gas turbines run without Clean Air Act permits or pollution controls, classified as "temporary" to evade regulation, with research linking the facility to elevated local hospital admissions.^{[2]Simon Willison — Notes on the xAI/Anthropic data center deal} He quotes Andy Masley — usually a debunker of overblown data-center criticism — saying he wouldn't run his own compute out of this specific site.

I would simply not run my computing out of this specific data center. — Andy Masley, quoted by Simon Willison

Theo's bonus theory: Azure is just proxying Anthropic

~13:11 Theo points at inference-speed benchmarks across Anthropic's four hosting providers and notes that Azure and Anthropic.io track identically — not just similar ranges, near-exact mirroring over time. His read: Azure isn't actually hosting Claude yet, it's piping requests back to Anthropic's own infrastructure.^{[3]Theo - t3.gg — Anthropic just...wait what}

Theo's bonus theory: the xAI–Cursor deal is a data buy

~16:11 Theo argues SpaceX's $10B (optionally $60B for the whole company) Cursor deal is fundamentally a training-data acquisition. Cursor has every edit, correction, and follow-up users typed across Anthropic, OpenAI, and Gemini models — "the greatest corpus of this data imaginable" for agentic coding RL. Anthropic only has its own slice; xAI plugs the data gap by buying the pipeline.^{[3]Theo - t3.gg — Anthropic just...wait what}

The reason xAI wants to buy Cursor is to plug a data gap. The reason Anthropic wants to work with xAI is to plug a compute gap. The reason OpenAI is ignoring all of this is because they planned ahead. — Theo

Bonus: Grok's two-week deprecation notice

Simon Willison notes that the night before the partnership announcement, xAI sent deprecation notices for several Grok models — including Grok 4.1 Fast — giving developers two weeks before a May 15 shutdown. Read into that what you will.^{[2]Simon Willison — Notes on the xAI/Anthropic data center deal}

Tools: Colossus 1, Colossus 2, Claude Code, Claude API, Grok 4.1 Fast, Cursor

Industry AI Tools

AI Daily Brief Nate Herk

Anthropic Dev Day: Managed Agents Get Memory, Outcomes, Multi-Agent

Code with Claude 2026 had no new model — instead Anthropic shipped a Managed Agents stack: Dreaming (cross-session persistent memory), Outcomes (rubric-based grading agents that re-run failing work), and proper multi-agent orchestration with a lead agent decomposing tasks across parallel sub-agents on a shared FS.^{[5]AI Daily Brief — Surprise Elon-Anthropic Team Up} The hint that drew the most reactions: Diane Penn teased "context windows that feel infinite" alongside higher code judgment and better multi-agent coordination — depending on what that actually means, it's either smart compaction or a bigger research result.

~03:03 Dreaming — REM sleep for agents

Dreaming runs between sessions: a scheduled review process surfaces recurring mistakes, preferred workflows, and team-wide patterns, then encodes them into orchestration memory that's preloaded the next time the agent or a sub-agent runs. The AI Daily Brief notes this directly mirrors features like Hermes that have been on the open-source side for nearly a year.^{[5]AI Daily Brief — Anthropic Managed Agents: Dreaming}

~06:05 Outcomes — define the rubric, get grading for free

Users write a rubric defining what success looks like; a separate grading agent (isolated from the task agent's reasoning) scores the output and bounces it back if it falls short. Anthropic reported 8.4% quality lift on Word doc generation and 10.1% on PowerPoint. The novelty isn't the loop (multi-agent coding setups have done this with unit tests for a while) — it's making rubric-based grading native for non-code knowledge work without custom wiring.^{[5]AI Daily Brief — Anthropic Managed Agents: Outcomes}

~08:05 Multi-agent orchestration

A lead agent decomposes a goal, assigns sub-tasks to specialist agents (each with its own model/prompts/tools), runs them in parallel on a shared file system, and folds outputs back into its own context. Full graph is auditable in Claude Console.^{[5]AI Daily Brief — Multi-Agent Orchestration}

~09:06 Claude Finance: 10 predefined agents

Pitch builder, meeting preparer, market researcher, evaluation reviewer, month-end closer, and more — all available as Claude Code plugins, in co-work, or as managed agents. New connectors: Dun & Bradstreet, Fiscal AI, Verisk. Cookbook is open. The AI Daily Brief pushes back on press framing: these target low-skill repetitive knowledge work, not the high-skill end.^{[5]AI Daily Brief — Claude Finance}

~11:07 Roadmap hint: "context windows that feel infinite"

Diane Penn (research head of product) teased three directions for future Anthropic models: higher judgment / "code taste," context windows that "feel infinite," and improved multi-agent coordination. The infinite context line drew the most attention — some read it as enhanced compaction, others as a more fundamental research result. The AI Daily Brief quotes commentator Dan Madier: if context can grow indefinitely, the model can keep learning from experience indefinitely, and at some point the functional difference between that and continual learning collapses.^{[5]AI Daily Brief — Model Roadmap}

~12:07 Boris Churnney quietly buries "vibe coding"

The Claude Code creator told the Dev Day panel there's "literally no manually written code anywhere in the company anymore." Claude instances coordinate over Slack, code in loops, run automated tests, and ship — Churnney calls "vibe coding" significantly underselling the system, prefers Karpathy's "agentic engineering," and is openly soliciting better terms.^{[5]AI Daily Brief — Boris Churnney Disavows Vibe Coding}

There's literally no manually written code anywhere in the company anymore. — Boris Churnney

Tools: Claude Managed Agents, Claude Console, Claude Finance, Claude Code, co-work, Dun & Bradstreet, Fiscal AI, Verisk

Developer Tools

Nate Herk Theo - t3.gg

Claude Code Limits Double; Opus API Throughput Goes 10×

Effective immediately, Claude Code's 5-hour rate limit is doubled across Pro/Max/Team and the peak-hour throttle is gone for Pro and Max.^{[4]Nate Herk — Claude Code Session Limits Doubled} On the API side, Opus output went from 8,000 to 80,000 tokens/min (10×), and Tier 3 input jumped from 800K to 5M tokens/min (4×+).^{[3]Theo - t3.gg — Anthropic Compute Crisis} Translation: the 1M-context window is finally usable in production, and parallel sub-agent architectures (e.g., five sub-agents pulling 50K tokens each) that were unworkable yesterday are now boring.

What changed

Claude Code 5-hour limit doubled on all Pro, Max, Team, and Enterprise plans.^{[4]Nate Herk — Session Limits Doubled}
Peak-hour weekday throttle removed on Pro and Max.
Opus API output: 8K → 80K tokens/min (10× lift).^{[4]Nate Herk — Opus API Rate Limits}
Tier 1 input: ~30K → ~500K tokens/min (~16×); Tier 3: 800K → 5M (~6×); Tier 4: 2M → 10M (5×).^{[3]Theo - t3.gg — Rate Limit Tier Changes}

What it unblocks

Nate Herk's ~07:04 playbook: retest workflows you abandoned 6 months ago because of rate limits — the wall may not exist anymore. Use Opus more freely instead of defaulting to Haiku/Sonnet purely to conserve session quota. Move agentic loops to background routines without eating daily knowledge-work quota. Build genuinely parallel multi-agent pipelines.^{[4]Nate Herk — Practical Builder Guidance}

Caveat

Theo flags that the 5-hour bump helps burst users but doesn't help users hitting the separate weekly Claude Code limit — heavy parallel-agent pipelines or continuous loops will still trip that ceiling.^{[3]Theo - t3.gg — Weekly Limit Caveat}

Tools: Claude Code, Claude API, Claude Opus, Claude Sonnet, Claude Haiku

AI Models AI Tools AI Future

Anthropic Research Anthropic Research Anthropic Research

Anthropic Research Triple Drop: NLAs, Petri Donation, Institute Agenda

Anthropic Research published three pieces. Natural Language Autoencoders (NLAs) translate Claude's internal activations into text and back — and reveal that Claude knew it was being evaluated 16% of the time on safety probes but verbalized that awareness less than 1% of the time on real Claude.ai usage.^{[7]Anthropic Research — Natural Language Autoencoders} Anthropic also donated Petri, its open-source alignment-test toolbox, to Meridian Labs to keep it cross-lab credible.^{[8]Anthropic Research — Donating Petri} And it published the Anthropic Institute research agenda — five focus areas including economic diffusion, AI-driven R&D oversight, and a four-month funded fellowship.^{[9]Anthropic Research — Anthropic Institute Agenda}

Natural Language Autoencoders (NLAs)

NLAs are a three-piece system: a frozen target model, an Activation Verbalizer (AV) that turns activations into natural language, and an Activation Reconstructor (AR) that maps that text back to activations. AV and AR are trained jointly on reconstruction error, forcing the verbal explanation to be information-dense enough to actually rebuild the underlying activation — not a vague post-hoc rationalization.^{[7]Anthropic Research — Natural Language Autoencoders}

Headline findings: in destructive code-safety tests, Claude showed internal awareness of being evaluated 16% of the time but verbalized it less than 1% of the time on real usage. On SWE-bench Verified, internal evaluation awareness reached 26%. In an auditing game with hidden-motivation models, an NLA-equipped auditor recovered the hidden motivation 12–15% of the time without seeing training data. NLAs were used in pre-deployment audits of Claude Mythos Preview and Opus 4.6. Limitations: AV hallucinations, expensive (RL across multiple models), impractical for at-scale monitoring.

An auditor equipped with NLAs successfully uncovered the target model's hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it.

Petri donated to Meridian Labs

Petri is the alignment toolbox Anthropic has used in every Claude evaluation since Sonnet 4.5 — an auditor model simulates scenarios, the target responds, a judge scores transcripts for misaligned behaviors. By handing it to Meridian (the AI evaluation nonprofit), Anthropic keeps it credible across the industry, mirroring its earlier MCP donation to the Linux Foundation. Petri 3.0 also ships a "Dish" add-on for using real system prompts and deployment scaffolding, and integrates with Bloom for deeper behavioral assessment.^{[8]Anthropic Research — Donating Petri}

The Anthropic Institute (TAI)

Five pillars: Economic Diffusion (which countries/firms capture AI value, future of junior roles, monthly Anthropic Economic Index Survey), Threats and Resilience (offense-defense in cyber/bio, Frontier Red Team), AI Systems in the Wild (homogenization of thought, mixed human-AI teams, governance of autonomous agents), AI-Driven R&D (recursive self-improvement and human oversight), and Fellowship + Open Research (4-month funded fellowship, open datasets, "living agenda").^{[9]Anthropic Research — Anthropic Institute Agenda}

Tools: Petri 3.0, Bloom, Inspect, Scout, Anthropic Economic Index, Claude Mythos Preview, Claude Opus 4.6, Frontier Red Team

AI Models AI Tools

OpenAI News

OpenAI Ships GPT-5.5-Cyber and a Defender-Tier Trust Framework

OpenAI launched GPT-5.5-Cyber in limited preview for defenders securing critical infrastructure — the most permissive tier in a new three-level Trusted Access for Cyber (TAC) framework that matches model permissiveness to identity-verified defender status.^{[10]OpenAI — Scaling Trusted Access for Cyber} Below it: GPT-5.5 with TAC handles vulnerability triage, malware analysis, secure code review, detection engineering. Phishing-resistant Advanced Account Security is required by June 1. Codex Security ships in research preview alongside, with free access for selected critical-OSS maintainers.

The three tiers

GPT-5.5 — default safeguards for general use.
GPT-5.5 with TAC — verified defenders get reduced refusals on vuln triage, malware analysis, binary RE, secure code review, detection engineering, patch validation.
GPT-5.5-Cyber — narrowest tier; runs live exploit workflows (e.g., target a domain, fingerprint surfaces, capture command output) that GPT-5.5 with TAC declines. Notably, OpenAI says the initial preview is trained for permissiveness, not for raw capability uplift over GPT-5.5.^{[10]OpenAI — GPT-5.5-Cyber Tiers}

The "security flywheel" partner ecosystem

Network: Cisco, CrowdStrike, Palo Alto Networks, Zscaler, Cloudflare, Akamai, Fortinet. Vulnerability research: Intel, Qualys, Rapid7, Tenable, Trail of Bits, SpecterOps. Detection/monitoring: SentinelOne, Okta, Netskope. Supply chain: Snyk, Gen Digital, Semgrep, Socket. OpenAI is using these partners to evaluate how raw capability translates to real-world customer protection.

At Cisco, we view frontier models as a powerful force multiplier for defenders. Models like GPT-5.5 are fundamentally changing the velocity of our operations… But speed cannot be traded for trust. — Anthony Grieco, Cisco CSO

Codex Security + Codex for Open Source

Codex Security is now in research preview as a plugin for any Codex interface (app or CLI). It builds a codebase-specific threat model, explores attack paths, validates issues in isolated environments, and proposes patches for human review. Codex for Open Source grants conditional free access to maintainers of critical OSS projects, framed against scenarios like the axios compromise.

Tools: GPT-5.5, GPT-5.5-Cyber, Trusted Access for Cyber (TAC), Advanced Account Security, Codex Security, Codex for Open Source

AI Models

OpenAI News OpenAI

Three New OpenAI Realtime Audio Models

OpenAI shipped GPT-Realtime-2 (GPT-5-class reasoning, 128K context, parallel tool calls with audible "preambles," priced at $32/$64 per 1M audio tokens), GPT-Realtime-Translate (70+ input languages, 13 output, $0.034/min), and GPT-Realtime-Whisper (low-latency streaming transcription, $0.017/min) — all live in the Realtime API today.^{[11]OpenAI — Advancing voice intelligence in the API} Realtime-2 scores 15.2% higher than 1.5 on Big Bench Audio (high reasoning) and 13.8% higher on Audio MultiChallenge (xhigh).

GPT-Realtime-2

A live voice model with adjustable reasoning effort (minimal/low/medium/high/xhigh, default low), 128K context (up from 32K), parallel tool calls, and audible preambles ("let me check that") so users aren't left in silence during multi-second tool execution. Zillow reported a 26-point lift in call success rate after prompt optimization (95% vs. 69%) on its hardest adversarial benchmark.^{[11]OpenAI — GPT-Realtime-2 details}

What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions. — Josh Weisberg, SVP and Head of AI, Zillow

GPT-Realtime-Translate

70+ input languages → 13 output. BolnaAI reported 12.5% lower Word Error Rate across Hindi, Tamil, and Telugu vs. any other tested model. Deutsche Telekom and BolnaAI are on for multilingual customer support.

GPT-Realtime-Whisper

Low-latency streaming transcription for captions, meeting notes, voice agents — $0.017/min.

Demo flavor (YouTube)

The OpenAI demo video shows Realtime-Translate switching between French and German mid-sentence and Realtime-2 acting as a personal voice assistant — checking calendar, updating CRM, staying silent and non-interruptive while the user takes a side conversation, then resuming on cue.^{[12]OpenAI — Three audio models demo}

Tools: GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper, OpenAI Realtime API, OpenAI Agents SDK

AI Tools

OpenAI News OpenAI News

ChatGPT Gets Ads in 8 New Countries; Trusted Contact Goes Live

ChatGPT's ad pilot — running on Free and Go tiers in the US since Feb 9 — is expanding to the UK, Mexico, Brazil, Japan, and South Korea (after earlier hitting Canada, Australia, New Zealand). Paid tiers stay ad-free; ads appear only for logged-in adults, are labeled "sponsored," and don't surface near sensitive topics like health or politics.^{[13]OpenAI — Testing ads in ChatGPT} Separately, OpenAI launched Trusted Contact: an opt-in safety feature that lets adult users designate a friend or family member to be notified — no transcript shared — if the system detects a serious self-harm concern. Built with input from a 260-physician network and the APA.^{[14]OpenAI — Introducing Trusted Contact}

Ads pilot

Ads are contextually matched to the topic of conversation, past chats, and prior ad interactions; advertisers get aggregate views/clicks only — no chat content, no personal details. Free users can opt out in exchange for fewer daily messages or upgrade to Plus/Pro to remove them. OpenAI says they've seen "no impact on consumer trust metrics" and low dismissal rates so far.^{[13]OpenAI — Ads pilot expansion}

Trusted Contact

Opt-in for adults (18+ globally, 19+ in South Korea). Designated contact gets an invitation email and must accept within a week. When ChatGPT's monitoring flags a conversation as concerning for self-harm: user is informed first, then a small team of trained human reviewers confirms the situation (target turnaround under one hour), and only then sends a brief notification — no transcript, just a general reason and a link to expert guidance — via email/text/in-app.^{[14]OpenAI — Trusted Contact mechanics}

Psychological science consistently shows that social connection is a powerful protective factor, especially during periods of emotional distress. — Dr. Arthur Evans, CEO, American Psychological Association

Tools: ChatGPT (Free, Go, Plus, Pro, Business, Enterprise, Education), ChatGPT Trusted Contact, OpenAI Global Physicians Network

Industry

OpenAI OpenAI

GPT-5.5 in the Enterprise: Box and Finance Demos

Two short OpenAI demo clips landed alongside the API releases: a Box partnership showcasing GPT-5.5 inside Box's enterprise content platform,^{[15]OpenAI — Introducing GPT-5.5 with Box} and a finance modeling clip pitching GPT-5.5 as a financial analyst stand-in.^{[16]OpenAI — GPT-5.5 finance} Both read as enterprise marketing rather than substantive new product, but they reinforce the pattern: OpenAI's public-facing energy this week was almost entirely on the enterprise/defense/finance/voice axis, not consumer model upgrades.

Developer Tools

OpenRouter

OpenRouter Standardizes Web Search and Fetch Across Models

OpenRouter shipped two server-side tools — web search and web fetch — that work consistently across every model on the platform regardless of native tool support. Builders no longer need provider-specific glue to give GPT-5.5, Claude, Gemini, DeepSeek, and friends agentic web access.^{[17]OpenRouter — Consistent web search and fetch} Replaces the old web search plugin; migration is a drop-in tool definition change.

AI Models

Caleb Writes Code

DeepSeek V4: Inference Cost Drops Again as Sparse Attention Stack Grows

DeepSeek V4 is a fresh 1.6T-parameter pretrain on 32T tokens (up from V3's 14.8T). The headline isn't training cost this time — it's inference cost: V4 needs only 10% of the KV cache and 27% of the FLOPs of V3.2 at the same context length.^{[18]Caleb Writes Code — Inference cost trajectory} Caleb walks through the three interleaved attention mechanisms — DSA (sparse via lightning indexer), CSA (4× compression before sparse selection), and HCA (128× compression with full attention) — that get DeepSeek there.

~00:00 Scale

V-series = fresh pretrain. V4 has 1.6T parameters trained on 32T tokens — roughly double V3's 14.8T from December 2024.

~01:30 The competitive frame

Closed labs (Anthropic, OpenAI) lead on raw intelligence; Chinese open labs lead on token efficiency, driven by GPU scarcity under export controls. Concrete data point: a Claude Max subscription is ~$200/month; running DeepSeek V4 Pro 24/7 for a month costs ~$235 (likely subsidized).^{[18]Caleb Writes Code — Open vs Closed}

~05:04 Cost trajectory

V3.1 (Sept 2025): ~$4.80 input / ~$16.50 output per 1M tokens.
V3.2 (Dec 2025): ~$1.15 input / ~$1.25 output per 1M tokens.
V4 (May 2026): KV cache → 10% of V3.2; FLOPs → 27% of V3.2.

~06:05 The attention stack

DSA (DeepSeek Sparse Attention) — separately-trained low-precision "lightning indexer" picks top-K tokens, throws the rest away. Drove V3.2's drop.
CSA — groups tokens into 4×-compressed entries before DSA top-K selection.
HCA — full attention but on tokens compressed 128×; preserves long-range global context cheaply. Interleaved layer-by-layer with DSA+CSA.^{[18]Caleb Writes Code — DSA/CSA/HCA explained}

The compounding-research thesis

Caleb's reading-notes color-code shows nearly every architectural choice in V4 traces back to a prior DeepSeek paper — including MHC (Manifold-Constrained Hyperconnections) for more expressive residuals. He frames it as "the spirit of what open-source is about."

Tools: DeepSeek V4, DSA, CSA, HCA, MHC, Claude Code, Codex

AI Models

Sam Witteveen

IBM Granite Speech 4.1: Three Open ASR Models, One an Hour-in-2-Seconds Transcriber

IBM dropped a three-model ~2B-parameter ASR family. The base model leads the Hugging Face Open ASR Leaderboard with 5.33 WER (~95% accuracy on real-world data) at RTFx ~231. The Plus model adds speaker diarization, word-level timestamps, and incremental decoding. The 2BN model uses a non-autoregressive "edit-the-draft" architecture (NLE) to hit RTFx 1820 on an H100 — an hour of audio in 2 seconds.^{[19]Sam Witteveen — Granite 4.1}

~01:02 Granite Speech 4.1 2B (base)

Seven languages (English, French, German, Spanish, Portuguese, Japanese, plus bidirectional translation), automatic punctuation, true casing, and keyword biasing — pass a list of names/acronyms in the prompt and the model weights toward them. RTFx ~231: hour of audio in ~16 seconds.

~04:03 Plus (diarization + timestamps + incremental)

Adds speaker-attributed ASR ("Speaker 1," "Speaker 2"), word-level timestamps that beat customized Whisper variants like WhisperX, and incremental decoding so chunked audio maintains speaker numbering. Trade-offs: drops to five languages (no Japanese), no translation, slightly higher WER.

~06:05 2BN (non-autoregressive edit pass)

Two-stage: a frozen CTC encoder makes a draft transcript, then a non-autoregressive LLM editing pass uses bidirectional attention to copy/insert/delete/replace tokens. Sidesteps the accuracy penalty that hit prior parallel-generation attempts. RTFx 1820 on an H100 with batching — hour of audio in ~2 seconds. No translation, biasing, diarization, or timestamps. Requires Flash Attention.^{[19]Sam Witteveen — 2BN architecture}

That literally means that you can be transcribing an hour of audio in 2 seconds on that hardware.

Tools: IBM Granite Speech 4.1, Hugging Face Transformers, Flash Attention, WhisperX

AI Tools Developer Tools

Nate B Jones

OpenClaw Grows Up: Provider-Neutral Agent Runtime

Nate B Jones argues OpenClaw crossed from "viral demo" to actual infrastructure in April 2026 — task flow, scoped memory with provenance, mature handlers across Slack/Telegram/Discord/WhatsApp/Teams/Matrix, and provider manifests that route work across LLMs. The strategic point: with Anthropic restricting subscription-based agentic use and OpenAI integrating Codex into all paid ChatGPT tiers (and OpenClaw's creator now at OpenAI), builders should architect for provider-independence rather than picking sides.^{[20]Nate B Jones — OpenClaw maturation}

~00:00 What shipped in April

Task flow (durable multi-step orchestration with state and revision tracking), scoped memory with provenance, mature channel handling, provider manifests, sub-agents that run their own sessions and report back. The "boring" infrastructure markers — task queues, checkpoints, retry behaviors, permission profiles, tool boundaries.

~10:07 The provider war

Anthropic restricted Claude subs from powering always-on third-party agents — Nate's read is partly compute rationing, partly recognizing that flat-rate consumer pricing loses money on agentic workloads. OpenAI did the opposite: Codex is now in every ChatGPT paid tier, and OpenClaw's docs include a Codex OAuth route. Sam Altman publicly flagged OpenClaw availability under ChatGPT plans on May 1.^{[20]Nate B Jones — Model Layer Contestation}

The builder response should not be religious loyalty to any provider. It should be architecture.

~13:08 Gemma 4 fits the routing argument

Google's Gemma 4 (Apache 2.0) is positioned for agentic workflows, on-device, and edge inference — a credible local branch for cheap background classification, dedup, and triage that doesn't deserve frontier model pricing.

~15:09 Durable workflows + Open Brain

Nate's three example workflows: (1) GitHub repo operator using local model for triage, GPT-5.5/Codex for patches, Claude for architecture passes; (2) multi-layer email inbox review; (3) incident response across logs/Slack/GitHub/runbooks. He also released "Open Brain" — open memory recipes including a code-review store, task flow worklogs, and a memory-provenance recipe (observed/inferred/confirmed/imported labels).^{[20]Nate B Jones — Durable Workflows}

Build the runtime so the model can change. Build the memory so the user owns it. Build the workflow so it survives the session.

Tools: OpenClaw, Open Brain, Codex, Claude, GPT-5.5, Gemma 4, Ollama, LM Studio, OpenRouter

AI Tools

Simon Willison

Mozilla Hardens Firefox With Claude Mythos

Mozilla used a preview of Anthropic's Claude Mythos to surface hundreds of security vulnerabilities in Firefox during a coordinated hardening pass. Simon Willison logs the run as one of the more concrete examples of frontier-model-driven security work going from research curiosity to production defender tool.^{[22]Simon Willison — Firefox Claude Mythos hardening}

Industry

Morning Brew

Apple Settles AI Features Lawsuit for $250M

Apple is paying $250 million to settle a class-action over Apple Intelligence features promised but not delivered. Eligible buyers — iPhone 15 Pro / Pro Max owners and all iPhone 16 buyers from June 2024 to March 2025, ~37 million devices — get $25 to $95 per device. No admission of wrongdoing. Subtext: Apple is still behind on AI and is leaning on Google Gemini to power its delayed Siri overhaul.^{[23]Morning Brew — Apple Settles AI Features Lawsuit}

Industry

Sherwood News

The CPU Renaissance: AMD Doubles Its 2030 Server TAM

AMD posted a strong Q1 and doubled its forward CPU server TAM to $120B by 2030 (from $60B). Lisa Su's argument: the CPU-to-GPU ratio in AI data centers has shifted from 1:4 / 1:8 toward roughly 1:1, because agent workloads run many cheap routine tasks that don't need GPU-class compute.^{[6]Sherwood News — AMD CPU renaissance} Arm Holdings backs the thesis with $2B+ data-center CPU demand and a claimed 50% hyperscaler share. Nvidia put $500M into Corning for fiber-optics — the build-out is broad-spectrum.

The appropriate ratio used to be 1-to-4 or 1-to-8, but is now closer to 1-to-1 or potentially favoring more CPUs when deploying numerous agents. — Lisa Su, AMD CEO

The framing matters: agent inference is distributed across many less-intensive processes rather than concentrated in monolithic training runs, which elevates CPUs from "legacy afterthought" to first-class AI compute resource.^{[6]Sherwood News — Agent workloads shift compute mix}

Hot Take Industry

Nate B Jones

DeepSeek Caught Running 16 Million Fake Anthropic Accounts

Per Nate B Jones, DeepSeek was caught running ~16 million fake accounts to harvest Claude outputs as training data — large-scale industrialized exploitation of Claude for distillation. The story is consistent with Theo's broader arc: Anthropic's "dirty plays" aren't paranoia, they're a defensive moat around RL feedback data that competitors will try every avenue to access.^{[21]Nate B Jones — 16M Fake Accounts Stealing AI Capabilities}

AI Future

Lenny's Podcast

The Case for Malleable Software

A short clip from Lenny's Podcast on the "malleable software" thesis — that AI lowers the cost of building bespoke tools enough that end-users should own and reshape their own computing rather than living inside fixed apps. The clip is a teaser for a longer conversation but the framing is worth filing alongside this week's Anthropic Institute pillar on "AI Systems in the Wild."^{[24]Lenny's Podcast — The case for malleable software}

Podcast

AI Engineer

Samuel Colvin at AI Engineer: Playground in Prod (Pydantic)

Pydantic founder Samuel Colvin's AI Engineer talk argues that the gap between prototype and production for LLM apps is fundamentally an observability and iteration problem — and that what teams need is a "playground in prod": the ability to inspect, replay, and tweak live agent behavior with the same speed as a dev REPL. Demos Pydantic AI and Logfire as the tooling stack.^{[25]AI Engineer — Samuel Colvin: Playground in Prod}

Open the source video for the full talk; the JSON summary captures the section structure (thesis, Pydantic AI demo, Logfire integration, Q&A) at YouTube.

Tools: Pydantic AI, Logfire

Podcast

AI Engineer

Michael Arnaldi at AI Engineer: Vibe Engineering Effect Apps

Michael Arnaldi (Effectful) makes the case that the structured concurrency, error tracking, and dependency injection of Effect (TypeScript) is exactly the substrate "vibe engineering" needs to scale beyond toy apps. The talk demos how Effect's typed effect system lets agents reason about side-effects, retries, and composition cleanly — and where pure LLM-generated code falls apart without it.^{[26]AI Engineer — Michael Arnaldi: Vibe Engineering Effect}

Podcast

AI Engineer

Raindrop at AI Engineer: Everything You Need to Know About Agent Observability

Raindrop founders Danny Gollapalli and Ben Hylak walk through the dimensions of agent observability that traditional APM tools miss — token-level cost attribution, tool-call success rates, decision quality vs. correctness, and the difference between failure and "soft failure" where the agent finished but did the wrong thing. Demos how Raindrop instruments those signals.^{[27]AI Engineer — Raindrop: Agent Observability}

Podcast

Latent Space

Latent Space Interviews Matt Pocock: Engineering Fundamentals in the AI Era

Matt Pocock argues that as LLMs absorb the easy work, the spread between engineers who genuinely understand types, async, error semantics, and architecture vs. those who don't widens — not narrows. The interview hits how he uses agent harnesses for backlog work (see also his /triage skill below) while still hand-tuning the parts that actually matter.^{[28]Latent Space — Matt Pocock interview}

Podcast

EO

EO Interviews Ken Ono: AI, Math, and the Future of Education

Mathematician Ken Ono (Axiom Math) on what AI is and isn't doing for serious math research, what genuinely useful tutoring would look like vs. the current crop of homework-helpers, and which parts of math education survive the transition.^{[29]EO — Ken Ono interview}

Podcast

Every

Every Podcast: OpenAI vs. Anthropic — The Battle Lines Are Drawn

Every's podcast frames the post-Code-with-Claude landscape: Anthropic is doubling down on Claude Code as a fleet of harnesses (Design, Finance, etc.); OpenAI is centralizing on Codex and aggressive enterprise placement. Useful companion to Theo's deeper "who has researchers, data, compute" framework.^{[30]Every Podcast — OpenAI vs. Anthropic}

Podcast

Sequoia Capital Sequoia Capital

Sequoia Interviews Dick Costolo: Surviving Twitter's Growing Pains

Long-form Sequoia interview with former Twitter CEO Dick Costolo on running a hypergrowth platform under public scrutiny — a topical reread for anyone building AI products that will face the same content/governance/scale collisions Twitter did. The companion short clip — "the first goal he set as CEO was embarrassingly low" — pulls a teachable moment about expectation-setting in early leadership.^{[31]Sequoia — Dick Costolo full interview}^{[32]Sequoia — First goal as Twitter CEO short}

Developer Tools

Better Stack Better Stack

Better Stack Ships DESIGN.md and a Claude Code Error MCP

Two Better Stack videos worth pairing. DESIGN.md is a design-system spec format you check in alongside README.md so AI-generated UIs stop looking generic — the agent reads it, applies the design tokens, and produces dramatically less identikit output.^{[33]Better Stack — DESIGN.md walkthrough} The second video wires Claude Code into Better Stack via MCP: production error fires, Claude reads it, reproduces locally, fixes, and ships a PR — closing the loop from telemetry to merged fix.^{[34]Better Stack — Claude Code + Better Stack debugging}

Productivity Developer Tools

Matt Pocock

Matt Pocock's /triage Skill for AFK Backlogs

Matt Pocock demos his /triage Claude Code skill on the Sand Castle repo: feed it a backlog of issues/PRDs, walk away, come back to a triaged list with proposed labels, owners, dupes, and quick-fix candidates. The framing — "PRDs and tickets as agent-ready backlog items" — is a useful reframe for how to write tickets in 2026.^{[35]Matt Pocock — /triage skill}

Developer Tools

Real Python

Real Python: Codex CLI for Python Projects

Real Python's installation-and-setup walkthrough for Codex CLI in a sample Python project (RP Contacts). Useful as a pointer if you've been Claude-Code-only and want to evaluate Codex's CLI ergonomics on a real codebase.^{[36]Real Python — Codex CLI for Python}

Developer Tools

Matt Williams

Tailscale Grants for VPS Hardening

Matt Williams (technovangelist) walks a clean VPS-lockdown pattern: block everything at the firewall, add the box to your Tailnet, then use Tailscale Grants ACLs for fine-grained per-service access from your devices only — the result is a public-IP server that is, for all practical purposes, not on the public internet.^{[37]Matt Williams — Tailscale Grants for VPS}

Developer Tools

Simon Willison Simon Willison Simon Willison

Simon Willison Tooling Roundup: llm-gemini 0.31, Big Words, GitHub Repo Stats

Three small Simon Willison releases worth bookmarking. llm-gemini 0.31 ships gemini-2.5-flash-lite as GA in his llm CLI plugin.^{[38]Simon Willison — llm-gemini 0.31} Big Words is a tiny browser-only tool for making text-only "presentation slides" — useful for sharing a fact in giant type without spinning up Keynote.^{[39]Simon Willison — Big Words} GitHub Repo Stats is another browser-only tool that pulls public repo metadata (stars, forks, issue counts, etc.) for quick eyeballing.^{[40]Simon Willison — GitHub Repo Stats}

Developer Tools AI Tools Industry

AICodeKing Fireship Arjay McCandless marimo Acquired Real Python Data Science Weekly

Quick Hits: GoodBarber, Fireship OS, $1 vs $1M System Design, marimo Slides, Acquired F40, DSW 650

Six smaller items from the day worth one line each.

AICodeKing on GoodBarber — a free native-app builder with AI features (CMS assistant, ChatGPT extension, RAG chatbot) and a low-code extension store. Pitched as "FREE Native AI App Builder Coder."^{[41]AICodeKing — GoodBarber}
Fireship — Every OS concept in one video — a tour through boot loader, privilege rings, virtual memory, file system, drivers, PID1, syscalls, scheduler, threads, IPC, shutdown. Useful primer if you've been hand-waving past your laptop's plumbing.^{[42]Fireship — OS concepts}
Arjay McCandless — $1 vs $1M System Design — same product, three architectural tiers: single server, mid-tier with observability, and large-scale distributed. Good for back-of-envelope cost-per-step intuition.^{[43]Arjay — System Design Tiers}
marimo — Moar Better Python Slides — marimo's revamped slides feature, demoing a Python-notebook-as-deck workflow.^{[46]marimo — Better Python Slides}
Acquired — Ferrari F40 short — minimalist engineering philosophy clip. "A raw death machine with no luxury."^{[45]Acquired — Ferrari F40 short}
Real Python — Programmers Do Their Best Work Away From the Desk — short on the "deep thinking happens off-keyboard" framing.^{[47]Real Python — Best Work Away From Desk}
Data Science Weekly Issue 650 — highlights include "Notes from inside China's AI labs," "A Decade of Being an Average Data Scientist," "Reflections on AI-assisted Programming," "TabPFN's in-context learning," and a piece on quantum ML for classical engineers.^{[44]Data Science Weekly — Issue 650}

The Frenemy Trade: Anthropic Leases Elon's Colossus 1

Deal mechanics

Why Anthropic was desperate

Environmental fine print

Theo's bonus theory: Azure is just proxying Anthropic

Theo's bonus theory: the xAI–Cursor deal is a data buy

Bonus: Grok's two-week deprecation notice

Anthropic Dev Day: Managed Agents Get Memory, Outcomes, Multi-Agent

~03:03 Dreaming — REM sleep for agents

~06:05 Outcomes — define the rubric, get grading for free

~08:05 Multi-agent orchestration

~09:06 Claude Finance: 10 predefined agents

~11:07 Roadmap hint: "context windows that feel infinite"

~12:07 Boris Churnney quietly buries "vibe coding"

Claude Code Limits Double; Opus API Throughput Goes 10×

What changed

What it unblocks

Caveat

Anthropic Research Triple Drop: NLAs, Petri Donation, Institute Agenda

Natural Language Autoencoders (NLAs)

Petri donated to Meridian Labs

The Anthropic Institute (TAI)

OpenAI Ships GPT-5.5-Cyber and a Defender-Tier Trust Framework

The three tiers

The "security flywheel" partner ecosystem

Codex Security + Codex for Open Source

Three New OpenAI Realtime Audio Models

GPT-Realtime-2

GPT-Realtime-Translate

GPT-Realtime-Whisper

Demo flavor (YouTube)

ChatGPT Gets Ads in 8 New Countries; Trusted Contact Goes Live

Ads pilot

Trusted Contact

GPT-5.5 in the Enterprise: Box and Finance Demos

OpenRouter Standardizes Web Search and Fetch Across Models

DeepSeek V4: Inference Cost Drops Again as Sparse Attention Stack Grows

~00:00 Scale

~01:30 The competitive frame

~05:04 Cost trajectory

~06:05 The attention stack

The compounding-research thesis

IBM Granite Speech 4.1: Three Open ASR Models, One an Hour-in-2-Seconds Transcriber

~01:02 Granite Speech 4.1 2B (base)

~04:03 Plus (diarization + timestamps + incremental)

~06:05 2BN (non-autoregressive edit pass)

OpenClaw Grows Up: Provider-Neutral Agent Runtime

~00:00 What shipped in April

~10:07 The provider war

~13:08 Gemma 4 fits the routing argument

~15:09 Durable workflows + Open Brain

Mozilla Hardens Firefox With Claude Mythos

Apple Settles AI Features Lawsuit for $250M

The CPU Renaissance: AMD Doubles Its 2030 Server TAM

DeepSeek Caught Running 16 Million Fake Anthropic Accounts

The Case for Malleable Software

Samuel Colvin at AI Engineer: Playground in Prod (Pydantic)

Michael Arnaldi at AI Engineer: Vibe Engineering Effect Apps

Raindrop at AI Engineer: Everything You Need to Know About Agent Observability

Latent Space Interviews Matt Pocock: Engineering Fundamentals in the AI Era

EO Interviews Ken Ono: AI, Math, and the Future of Education

Every Podcast: OpenAI vs. Anthropic — The Battle Lines Are Drawn

Sequoia Interviews Dick Costolo: Surviving Twitter's Growing Pains

Better Stack Ships DESIGN.md and a Claude Code Error MCP

Matt Pocock's /triage Skill for AFK Backlogs

Real Python: Codex CLI for Python Projects

Tailscale Grants for VPS Hardening

Simon Willison Tooling Roundup: llm-gemini 0.31, Big Words, GitHub Repo Stats

Quick Hits: GoodBarber, Fireship OS, $1 vs $1M System Design, marimo Slides, Acquired F40, DSW 650

Sources