Claude leases Musk's Colossus, Codex steals Every

AI Tools Developer Tools Industry

Code w/ Claude 2026: Routines, "Dreaming," and a 17× API surge

Anthropic's annual conference came with no new model — and a lot of platform. CPO Ami Vora opened with API volume up 17× year-on-year; Boris Cherny demoed Routines, async multi-session "higher-order prompts" that produce PR-ready output overnight; Managed Agents added multi-agent orchestration, an Outcomes contract, and a self-improving research preview called Dreaming; and Mercado Libre's 23,000 engineers pre-committed to 90% autonomous coding by Q3 2026.^{[1]Simon Willison — Live blog: Code w/ Claude 2026} Simon Willison's own hot take: too inspirational, too few concrete model details.

Platform: 17× API growth, advisor strategy, Claude Design

Vora opened with the headline metric (API volume up 17× YoY) and announced doubled Claude Code five-hour limits for Pro, Max, Team, and Enterprise — the same change Anthropic's blog shipped the same day.^{[1]Simon Willison — Live blog: Code w/ Claude 2026} Platform leads Katelyn Lesse and Angela Kiang detailed an advisor strategy: Opus 4.7 directs cheaper smaller models, with one customer reportedly hitting "frontier model quality at 5× lower cost." Dianne Na Penn (Head of Product, Research) highlighted Claude Design — a new visual design capability shipping in Opus 4.7 — and pushed the audience to "design for the next model" by building automated evals so improvements compound.

Managed Agents: orchestration, Outcomes, and "Dreaming"

The Managed Agents team pitched "10× faster shipping" via three new features: multi-agent orchestration for parallel task fleets, an Outcomes primitive for explicit success criteria, and a research preview called Dreaming where agents analyze their previous sessions and build persistent memories that improve performance across runs without human supervision.^{[1]Simon Willison — Live blog: Code w/ Claude 2026}

Claude Code: CI auto-fix, Security Reviews, Remote Agents, Routines

Cat Wu (Head of Product, Claude Code) showed CI auto-fix for failing PRs, Security Reviews, and Remote Agents for phone-driven laptop control — all riding the same Agent SDK across CLI, IDE, and desktop. Boris Cherny then demoed Routines, "higher-order prompts" that orchestrate async multi-session workflows; the example showed an overnight batch that delivered PR-ready output by morning. Shopify is already running Claude Code at scale.

Enterprise: Mercado Libre's 90% target

Mercado Libre, with 23,000 engineers, publicly committed to 90% autonomous coding by Q3 2026 — one of the most aggressive enterprise AI coding targets disclosed to date.^{[1]Simon Willison — Live blog: Code w/ Claude 2026}

Hot take: too inspirational

Willison's own footnote on the day: the conference leaned heavily on aspiration (autonomy, agents, collaboration) and was thin on concrete model and technical specifics. No new flagship model was announced.

frontier model quality at 5× lower cost

Tools: Claude Code, Opus 4.7, Claude Design, Claude Managed Agents, Routines, Agent SDK, Dreaming

Industry AI Models

Anthropic Simon Willison The AI Daily Brief

Anthropic plugs Claude into SpaceX's Colossus — 220K GPUs and doubled limits

Anthropic announced same-day access to SpaceX's Colossus 1 supercluster — over 220,000 NVIDIA GPUs and 300+ MW within the month — and doubled Claude Code's five-hour rate limits for Pro, Max, Team, and Enterprise plans, also dropping peak-hour throttling for Pro and Max.^{[2]Anthropic — Higher usage limits for Claude and a compute deal with SpaceX} The Colossus deal stacks on top of an Amazon 5 GW agreement, a 5 GW Google/Broadcom partnership starting 2027, a $30B Azure expansion with Microsoft and Nvidia, and a $50B Fluidstack U.S. infrastructure investment — and lands the same day Anthropic-Google's $200B, 5-year, 5 GW commitment is being framed as 40%+ of Google's reported $462B cloud backlog.^{[3]The AI Daily Brief — Who Cares About Consumer AI}

The Colossus deal

From Anthropic's post: "This gives us access to more than 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) within the month."^{[2]Anthropic — Higher usage limits for Claude and a compute deal with SpaceX} The optics are notable — Colossus 1 is the supercluster historically associated with Musk's xAI; Anthropic, OpenAI's chief rival, is now leasing it via SpaceX. Anthropic flagged plans for additional international capacity in Asia and Europe, "focused on democratic countries with secure supply chains."

Usage limits doubled, peak-hour throttling dropped

Effective May 6, 2026, the Claude Code five-hour rate limits doubled across Pro, Max, Team, and Enterprise. Pro and Max also lost the previously-applied peak-hour reduction, restoring consistent limits around the clock. API rate limits for Opus models also went up.^{[2]Anthropic — Higher usage limits for Claude and a compute deal with SpaceX}

How big the broader buildout is

The AI Daily Brief framed it via Wall Street: hyperscalers Microsoft, Oracle, Amazon, and Google are now reporting roughly $2 trillion in cloud backlog, with OpenAI and Anthropic accounting for almost half. ~06:05 The same day Palantir CTO Shyam Sankar declared "tokens are the new coal. Palantir is the train," and BlackRock's Larry Fink reiterated "there is not an AI bubble. There is the opposite. We're short power. We're short compute. We're short chips."^{[3]The AI Daily Brief — Who Cares About Consumer AI}

This gives us access to more than 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) within the month.

Hot Take Developer Tools

Every Nate Herk

Codex hits its stride: Every defects, Nate Herk teaches the playbook

Three months after calling Codex "trash," Every CEO Dan Shipper and Head of Growth Austin migrated their entire knowledge-work stack from Claude Code to OpenAI's Codex desktop app — calling it "30–40% better as an agent management interface" and arguing GPT-5.5 has reached parity with Opus for general work.^{[4]Every — Why We Switched From Claude Code to Codex} Hours later, Nate Herk shipped a 1-hour full course on driving Codex end-to-end, including agents.md context files, Plan Mode, GPT Image 2 design previews, and Vercel deploys.^{[5]Nate Herk — Master 97% of Codex in 1 Hour (full course)}

Every: agent management is the new OS

~00:00 Shipper frames OpenAI's hard pivot: Codex was originally aimed at senior engineers and felt argumentative and emotionally tone-deaf, but over the last ~3 months it became a general-purpose daily driver. ~04:01 His thesis is that the new "operating system for knowledge work" is an agent management interface — a desktop app that wraps a coding agent with file system, browser, and terminal access. Anthropic has Claude Code/Co-work, OpenAI has Codex, "xAI essentially bought Cursor," and Google's Antigravity is "no serious users yet." Switching is easy: ask Codex to grab all your Claude stuff.

~06:02 Austin's "agent pill moment" came in Dec/Jan via Claude Code in Warp; he tried Codex in February but it made him feel "more stupid than I have ever felt" with three-option clarifying questions. He stayed in Claude Code for ~80% of work until GPT-5.5 dropped. ~09:07 With GPT-5.5 he sees parity with Opus on knowledge work (Opus still better on design); the differentiator is the desktop app — "no comparison" for speed and organization vs Claude Desktop or Co-work. He pulls from Gmail, Slack, Notion, and Stripe; one example automation now drafts replies to unanswered messages and sends them on a Slack thumbs-up.

Nate Herk's playbook: agents.md, Plan Mode, ship to Vercel

~01:01 Walks through the Codex desktop app: model picker (GPT-5.5/5.4), intelligence levels (low/medium/high/extra high — recommends medium for planning, high only for big builds), and the Full Access toggle that lets agents act autonomously. ~08:04 Creates an agents.md onboarding file (Codex's equivalent of CLAUDE.md), uses Plan Mode to brainstorm before executing, and connects YouTube Data API v3 via Google Cloud Console. ~21:08 Codex pulls 200 YouTube comments and ships a multi-tab Excel deliverable; ~33:15 uses GPT Image 2 to generate UI/logo, builds a Next.js dashboard, and deploys to Vercel via GitHub. ~28:14 Demonstrates converting the workflow into a reusable Skill — a markdown recipe Codex can re-run.^{[5]Nate Herk — Master 97% of Codex in 1 Hour (full course)}

Codex can do everything that chat can do, but chat cannot do nearly as much as what Codex can do.

Tools: Codex, Claude Code, Co-work, Antigravity, Warp, GPT-5.5, Opus, Notion, Slack, Stripe, Vercel, GPT Image 2, Skills

Hot Take Developer Tools Industry

Theo - t3.gg Low Level

Theo: GitHub broke under agent load — time for Forgejo

Days after GitHub's PR merger started silently reverting prior PRs and a Whiz-disclosed git push RCE landed, Theo issued a 55-minute case for moving on.^{[6]Theo — What's next?} His thesis: GitHub's "pile of Ruby slop" can't horizontally scale to the agent era — Pierre handled 9M repos in 30 days, peak 15K/min for 3 hours, while GitHub buckled under "20M new repos." After eviscerating GitLab (3.88M lines of Ruby + Vue 2), Bitbucket, and rug-pulled Gitea, he donated $1,200 + $400/mo to Forgejo and Codeberg on stream. The Whiz writeup explains exactly how the GitHub bug got worse: a single git push could escalate to RCE.^{[7]Low Level — The GitHub situation just got worse...}

Theo: GitHub failed; Mitchell Hashimoto already left

~00:00 "GitHub might not be the safest place for us to be leaving our code now that they're randomly reverting merges and having downtime that is measured in days instead of minutes." Theo lays out the criteria for a successor: server-backed Git remote, PR workflow, profiles/community, CI/CD, plus stability, open source, AI-native. ~33:32 "They're built on top of a pile of Ruby slop that horizontally scales almost decently but barely." Pierre's stat: 9M repos in 30 days, peak 15K repos/min for 3 hours straight with no downtime — vs GitHub crashing under 20M new repos.^{[6]Theo — What's next?}

The alternatives shootout

~03:02 GitLab gets demolished — 3.88M lines of Ruby, 1.16M lines of plain JavaScript, Vue 2 — "just a worse version of GitHub the same way Azure is just a worse version of AWS." ~16:12 Bitbucket dismissed as a Jira upsell. ~20:16 Gitea got rug-pulled to private, with suspicious zero-follower testimonial accounts. ~22:18 The Forgejo fork (Codeberg-hosted, ~400K lines of Go, GitHub Actions–compatible YAML, transparent Mastodon status updates) wins. Theo donates $1,200 immediately and pledges $400/month, noting Codeberg "wasn't even at $300 a week. Gross."

Generations of dev tools — Gen 3 source control isn't ready

~11:08 Theo's framing: editors went Sublime → Atom → VS Code → Cursor (Gen 2 with AI overlay), now T3 Code, Codex desktop, Cursor Composer, Anti-Gravity are early Gen 3. For source control: SVN is Gen 1, GitHub/GitLab/Bitbucket Gen 2, "Gen 3 is not even close to ready yet." ~44:41 Open question: "Is Gen 2 to Gen 3 still git? I don't [bleep] know." He invested in Ither (betting it isn't), watches Pierre's primitives (code.storage, diffs.com, trees.software), and notes Graphite was acquired by Cursor. Zed's CRDT-based Delta DB is another data point.

Whiz Research: full RCE via a single git push

~00:00 The merge-revert bug pushed Mitchell Hashimoto (Ghostty) to publicly leave GitHub, calling it "no longer a serious place for serious work." ~01:01 The bigger story: Whiz Research disclosed (March 4, fixed same day) that GitHub's Babeld binary turned git push options into an Xstat HTTP header without sanitizing semicolons. Chaining header injection with a non-prod Rails env, a custom hooks dir, and a path traversal in a pre-receive hook gave arbitrary binary execution as the git user — with read access to every repo on the server, bypassing private repo protections entirely. ~08:06 "Fixed within the same day, which is a great thing for the world."^{[7]Low Level — The GitHub situation just got worse...}

GitHub is no longer a serious place for serious work.

I'm putting my money where my mouth is. We're donating. They're not even at $300 a week. Gross.

Podcast

Nerd Snipe

Theo Almost Lost $1 Million (Nerd Snipe)

Nerd Snipe interviews Theo Browne (t3.gg / Ping / T3 Chat / UploadThing) about a midnight benchmark tweet that nearly cost his company a $1M Microsoft Azure credit and ended up forcing Microsoft to fix an 18-month Azure-OpenAI inference regression.^{[8]Nerd Snipe — Theo Almost Lost $1 Million}

~01:00 The $1M Azure credit and the tweet that almost killed it. Microsoft had gifted Ping a $1M Azure credit, valuable specifically because Azure was the exclusive hosting path for OpenAI models. ~03:01 Benchmarks showed Azure averaging 4× slower than OpenAI direct, with P90 latencies 10–20× worse and time-to-first-token spiking up to 200s vs ~5s. After 14 months of complaining without a fix, Theo built a benchmark designed to burn through tokens, posted it around midnight Thursday PST, got ~1k likes in two hours, and immediately got texts from Microsoft contacts asking him to take it down.

~06:02 Microsoft CVP call and the May 15 deadline. Theo got on a call with a Microsoft CVP triaging with engineers in real time. They built monitoring they admitted hadn't existed: "monitoring includes alerting and there was no alerting; this regression has existed for the better part of a year and a half." ~11:04 Today, OpenRouter and azure.t3.gg show Azure ~10–20% faster than OpenAI direct on inference, throughput up from ~8 tokens/sec to 60+. Theo's punchline: "I single-handedly made Azure inference 10 to 15 times faster by bitching loudly enough."

~09:04 0% cache hit aftermath and the "noisy neighbor" cope. After he started spending the credit, cache hit rate dropped from ~60% to 0%; Microsoft eventually admitted a "noisy neighbor" caused 99.9% zero-cache rate, again with no alerting. He used phrases like "false positive indicator" Theo mocked as cope.

~20:10 The five coding agent SDKs that matter. Cursor, OpenCode, Claude Code, Codex, and Pi (by Mario, Theo's personal favorite). Codex is "unlimited inference" because OpenAI is heavily subsidizing it — he ran 40+ benchmark threads. ~33:16 GPT-5.5 pricing: smarter = fewer tool calls = cheaper in practice. ~37:16 Gripbench results across 5 models. ~51:21 Cost math: zero out input tokens, focus on output.

I single-handedly made Azure inference 10 to 15 times faster by bitching loudly enough.

AWS wins by just being there. Yeah, literally just sitting there doing nothing seems to be a pretty insane strategy right now.

Podcast

Dwarkesh Patel

Dwarkesh × Ada Palmer: The Wars That Made Machiavelli

Ada Palmer traces how Petrarch's post-Black Death humanist project to revive ancient virtue through the classics evolved, after disastrous wars, into Machiavelli's reframing of history as a casebook of practical political examples.^{[9]Dwarkesh Patel — The Wars That Made Machiavelli - Ada Palmer}

~00:00 An age of ash and shadow. Petrarch survives the Black Death only to learn two of his last surviving friends were attacked by bandits — one killed, the other lost wounded for over a year. He concludes the root problem is selfish leadership and proposes imitating the Romans, citing the early consul Brutus who executed his own sons for plotting to make him king.

~01:01 The Petrarchan project. Build libraries stocked with the texts Cicero and Brutus read — Plato, Homer, the canon — on the theory that reading about virtuous people will cause readers to imbibe their courage. The project's empirical failure: princes raised on Cicero and Livy go on to fight a worse war than the ones the program was designed to prevent.

~02:02 Machiavelli's revision. Same library, opposite method: instead of the classics as moral nutrition, treat them as a casebook. Lay five battles fought near rivers side-by-side, compare the commanders' decisions, extract what worked. Palmer notes Machiavelli's contemporaries described him primarily as a historian — a methodological innovation more than a moral one.

Instead of feeling that reading about good men will make us good, we read about wise choices and we imitate those choices.

Podcast

OpenAI

OpenAI Podcast Ep. 18: MRC, the network OpenAI built when InfiniBand wasn't enough

Host Andrew Maine talks with Mark Handley (Core Networking) and Greg Steinreer (Workload Systems) about MRC (Multipath Reliable Connection), the Ethernet-based protocol OpenAI built to scale GPU training: spray packets across thousands of paths, trim aggressively on congestion, and route around failures without central coordination.^{[10]OpenAI Podcast Ep. 18}

~00:00 Intro. Mark Handley is also a UCL professor with foundational work on internet video conferencing protocols (later used in 4G/5G); Greg Steinreer came through quantum-computing PhD work, optical switch design, and data-center network simulation.

~05:00 Why AI training breaks conventional networking. Internet/data-center designs benefit from statistical multiplexing — many independent flows smooth out load. AI training is the opposite: a single synchronous workload where every GPU waits for the slowest. "About the worst possible workload you could put onto a network."

~08:00 Scale. A single switch can't deliver enough bandwidth, so OpenAI builds hierarchies of hierarchies — several thousand switches per building, millions of optical links inside one data center, thousands of possible paths between any two GPUs. This is a P100 (worst-case) problem; the most-bottlenecked link sets cluster training speed.

~11:00 Failure rates and BGP's limits. Each GPU sits behind tens to hundreds of failure-prone components (an optical transceiver alone has ~4 lasers per side). Doubling cluster size roughly halves MTBF. BGP-style gossip convergence takes seconds — too slow at this scale.

~15:00 MRC explained. Spray packets across many paths simultaneously; switches "trim" packets when congestion is detected; flows self-heal without central coordination. ~17:00 Training is faster, network is self-healing, and switches stay stateless with static routing. ~22:00 Partners include Microsoft Fairwater, Nvidia, Broadcom, AMD, and Intel; OpenAI is contributing it as an OCP open standard. ~30:00 Implications: flatter networks, better power efficiency, IPv6 segment routing.

We know we've won when researchers stop needing to know what network protocol this particular cluster is using.

Podcast

Y Combinator

Y Combinator × Razorpay: Harshil Mathur on building India's biggest payments company

Razorpay was YC's first Indian investment (W15). Co-founder Harshil Mathur recounts a side-project frustration that became a $180B/year payments platform — including a year-long wait for the first live transaction, a sponsoring bank pulling the plug two weeks after Demo Day, and going live on UPI before incumbent banks did.^{[11]Y Combinator — How Razorpay Became India's Largest Payments Company}

~00:00 YC's first Indian investment (W15). Mathur graduated IIT, took a placement at a Middle East oil company, hated it, and coded on the side. While building a crowdfunding side project he discovered accepting digital payments in India was nearly impossible — cash was easier — which violated his sense that technology should democratize.

~04:04 Wrong GTM, then a YC pivot. The YC application targeted educational institutes for fee collection. Inside YC he traveled to Jaipur and small universities, only to find administrators didn't care about smooth digital collections — they'd just pass the 1% on to students. He pivoted hard into startup customers from his coworking space.

~07:05 A year to first transaction — regulation as moat. They spent the entire 3-month YC batch waiting on bank approvals and licenses; about a year passed between Demo Day and first live transaction. Mathur reframes regulation as a moat: every later entrant has to clear the same hurdles, which is why Indian payments isn't flooded with 100 new competitors per month.

~11:08 The bank pulled the plug. Two weeks after Demo Day, after a TechCrunch launch and ~50 live merchants, the sponsoring bank shut the platform overnight over a single complaint. ~13:08 Mathur and a handful of teammates locked themselves in a room and called every merchant personally — taking abuse rather than ducking calls.

~16:13 Turning down global acquirers, taking the long view on India. ~18:17 40× growth on near-zero burn — interest income exceeded burn. ~21:20 Going live on UPI before the big banks did.

In 2014, 2015 the payment volume of India was $60 billion. That's what I put in my YC application. Razorpay alone today does $180 billion.

Podcast

Matt Williams

Matt & Ryan: AI Dev SF, Opus 4.7 disappointment, Ollama Cloud, leaving GitHub

Matt Williams (technovangelist) catches up with Ryan, who repped Datadog at Andrew Ng's AI Dev SF 2026. They range across what conference attendees actually want (an evaluation "happy path"), Opus 4.7's regressions, the Hermes/AGENTS.md billing rumor, BSD-focused agents, Ollama Cloud reliability, agent-on-agent workflows, and why Ryan moved his repos off GitHub to a self-hosted Gitea.^{[12]Matt Williams — Matt and Ryan have a chat on May 05, 2026}

~00:02 AI Dev SF 2026 recap. Pier 48, Dogpatch — packed and global, skewing toward learners and builders rather than vendors. Datadog ran a 3-person booth (Jason Hand, SF AI hire Marina Pszell, Ryan) for 11–12 hour days; traffic intensity rivaled re:Invent.

~04:06 What attendees want: an evaluation happy path. Builders are starting to treat single LLM generations as simulations — re-rolling 10 times. Datadog captures OpenAI/Anthropic-gateway-compatible traffic as traces and runs LLM-as-judge over payloads.

~06:07 Opus 4.7 disappointment, Kimi K2.6 love. Ryan: Opus 4.7 is worse than 4.6 for him — he'd happily go back to 4.5 with the smaller context window. ~07:09 Doubled context only helps stretch a session; past 50% it degrades, past 66% nearly unusable, compacting isn't worth it. He's been using Kimi K2.6 and ChatGPT 5.5/5.4 instead, and is building BSD-focused agents because the corpus is bad on Solaris/AIX/IRIX/Unix and (surprisingly) Windows/PowerShell.

~09:10 The Hermes / AGENTS.md billing rumor. Rumor that having Hermes in your AGENTS.md puts Anthropic users into a different billing tier; his open-Claude-adjacent work burns API credits immediately. Working on Claude Code itself feels like a virtuous cycle; anything else feels degraded.

~11:11 Canceling Perplexity, MLX-on-Mac, small models. Cancelled Perplexity — computer use ate his credits before the first run. Praise for LFM, Gemma 3, and Phi at small sizes.

~14:16 Ollama Cloud, Tensor Zero, throttling. Ollama Cloud reliability discussion, Tensor Zero rate-limit smoothing, and using "tar pit" throttling for noisy clients.

~18:20 open code tasks, pi (the loop), Datadog Labs. Diamond Bishop's Datadog Labs agent dispatcher; using Pi/the loop for orchestration.

~24:26 Leaving GitHub for self-hosted Gitea; Ghostty exit; supply-chain anxiety. Same week Theo and Mitchell Hashimoto are abandoning GitHub.

Last week we were talking about Opus 4.7, which… for me is is not as good as 4.6. I've gone back to 4.6.

Podcast

AI Engineer

Nick Nisi & Zack Proser at AI Engineer: Skills at Scale (WorkOS)

WorkOS DX engineers Nick Nisi and Zack Proser deliver a workshop on Claude/agent Skills — small markdown+script bundles that encode reusable workflows — covering authoring, progressive disclosure, confidence scoring, evals, sharing across teams, and uses well beyond coding (image/video gen, recruiting reports, blog writing). They open with "I think I did a CD in a directory recently. Otherwise, it's been like probably six or eight months" of writing code by hand.^{[13]AI Engineer — Skills at Scale (WorkOS)}

~00:14 Why context resets matter. Every LLM conversation starts from zero. CLAUDE.md/AGENTS.md files help but bloat context and aren't portable across tools. ~07:15 Anatomy of a skill. A single markdown file with YAML frontmatter — name and description — is enough; the description IS the routing rule. ~09:17 Constraints over prescription: e.g., "never be vague," "always cite a line number and git ref" beats step-by-step instructions.

~10:18 Audience builds a "repo roast" skill from a cloned repo with a share.sh upload script. ~13:20 Skills load from .claude/skills/<name>/SKILL.md (project), ~/.claude/skills (global), or marketplaces (the official Claude marketplace, npx skills/Vercel, internal git repos). Cross-tool: Claude, Codex, Cursor, and Claude Desktop all support skills — making them accessible to non-technical users (Nick built skills with WorkOS's recruiting team in Claude Desktop pulling from Slack, Notion, and ATSes into uniform reports).

~23:44 Progressive disclosure. SKILL.md acts as a router pointing to other markdown files (testing.md, scoring rubric, framework-specific guides) loaded only when needed. WorkOS's public skills repo uses this for Auth0 migration guides and an "authkit" router for Next.js, TanStack Start, etc. ~16:23 Bang-backtick interpolation (e.g., !`git log -10`) executes scripts at skill-load time to inject deterministic data instead of letting the LLM hallucinate.

~46:58 Confidence scoring. Nick's open-source "ideation" plugin iteratively interrogates the user with multiple-choice questions until reaching ~95% confidence across rubrics (problem clarity, goal definition, success criteria, scope, consistency) before producing a contract and phased spec. ~29:07 Q&A: formal eval framework (Claude now has a built-in one with HTML reports), pickup conflicts, sub-agents vs skills.

Descriptions are routing rules — they're less for us and they're more for the AI to determine when to use it.

Podcast

AI Engineer

Luke Alvoeiro at AI Engineer: Missions — Multi-agent systems that ship for days (Factory)

Luke Alvoeiro (Factory; previously the creator of Goose at Block) presents Missions, an architecture combining delegation, creator-verifier, broadcast, and negotiation patterns to run autonomous coding tasks for hours or days. Their longest mission ran 16 days, and they believe 30 is achievable.^{[14]AI Engineer — Missions: Multi-Agent Systems That Ship for Days (Factory)}

~00:00 The bottleneck is human attention, not intelligence. An engineer might have 50 features in the backlog but can only drive a few per day because each requires review. Models can execute all 50; the missing piece is supervision bandwidth.

~01:30 Five frontier multi-agent patterns. (1) delegation, (2) creator-verifier (a fresh agent with no cost bias verifies — analogous to human code review), (3) direct communication (agents DM each other; hard because state fragments), (4) negotiation (over shared resources), (5) broadcast (one-to-many shared constraints — critical for coherence over long runs).

~03:30 Missions architecture. Three roles: orchestrator (planning, sounding board, validation contract that defines "done" before any code), workers (clean-context implementation, Git-committing handoffs), and validators. ~05:00 Two-stage validation. A scrutiny validator (tests, types, lints, dedicated code-review agents per feature) plus a user-testing validator that spawns the live application and interacts via computer use (filling forms, clicking buttons). Most wall-clock time is spent waiting on real-world execution. Critically, neither validator has seen the code before — validation is adversarial by design.

~06:30 Validation contracts written before any code. ~08:00 Structured handoffs and a 16-day mission record. ~09:00 Serial execution beats parallelism for coding (less merge cost). ~11:00 "Droid whispering" — picking the right model for each role.

The bottleneck in software engineering nowadays is not intelligence. It's now limited by human attention.

Tests written after implementation don't catch bugs. They confirm decisions.

Podcast

AI Engineer

Liad Yosef & Ido Salomon at AI Engineer: MCP UI — extending the frontier

Ido Salomon (creator of MCPUI) and Liad Yosef (co-founder of Ergo Labs) present the new MCP Apps spec — a standardized way for MCP servers to ship interactive, branded UI chunks into chat hosts (Claude, ChatGPT, VS Code, Cursor) with bidirectional message passing back to the model.^{[15]AI Engineer — MCP UI: Extending the frontier}

~00:07 Why MCP apps. Chat tools historically returned text — sub-optimal and brand-erasing for providers like Shopify, Booking, and Expedia. Let every tool ship its own interactive UI chunk into chat. ~02:08 MCPUI launched in May 2025 with community SDKs; Anthropic and OpenAI partnered to formalize it as MCP Apps, the first official MCP extension. Adoption: VS Code, Cursor, Claude, ChatGPT, Microsoft Copilot, GitHub, Postman, Goose, terminal hosts like Spy. ChatGPT now recommends MCP apps as the way to build ChatGPT apps; Shopify already ships MCPUI chunks for millions of stores; Hugging Face Spaces are MCPUI widgets.

~05:09 The mechanic. Instead of returning text, the server returns a resource containing HTML; hosts render it in a sandbox; clicks emit messages back to the host (not directly to the underlying provider), keeping the model in context for follow-ups. ~06:09 Demo: "analyze my funnel" against PostHog returns a PostHog-branded funnel visualization; "what is a funnel" returns a Claude-generated explainer UI riding the same MCP app substrate.

~08:09 Architecture walkthrough: tool call → resource pointing to UI → sandboxed React render → callback emits messages → model can fire follow-up tool calls. ~10:09 A "new web of UI chunks" — anniversary planning across Google, Amazon, and Booking inside the same chat. ~12:11 Interaction spectrum: notifications, tool calls, prompts. ~14:12 Roadmap: XApps SDK, reusable views, model-driven UI, generative UI interop.

Perhaps in 2 years, we won't have browsers as we know them. We won't have websites as we know them.

Podcast

Sequoia Capital

Sequoia × Recursive Intelligence: AlphaChip's creators want to design every chip

Anna Goldie and Azalia Mirhoseini, co-creators of Google's AlphaChip (deployed on the last four TPU generations, Axion CPUs, Pixel chips, and AV silicon), launch Recursive Intelligence — building tools 100,000× faster than EDA, then a "designless" platform where you hand in a workload and get GDS2-clean silicon.^{[16]Sequoia Capital — Recursive Intelligence}

~00:03 From AlphaChip to Recursive. AlphaChip (deep RL for chip layout, Nature paper) shipped on the last four Google TPUs, Axion data-center CPUs, Pixel SoCs, and autonomous-vehicle chips, plus external customers like MediaTek. ~01:04 Phase 1: accelerate physical design and verification — each can take a year and require thousands of engineers; one day of delay on a Blackwell-class chip costs ~$225M in opportunity. ~03:05 Phase 2: a designless platform — hand in a workload (e.g., a next-gen Claude-class model) and the platform designs an architecture optimized for it, all the way to GDS2 handoff. Phase 3: vertical integration — fabricate their own chips, train co-evolved models on them.

~04:06 Tooling approach. Existing commercial EDA inner-loop tools take days per iteration. Recursive's plan is to redesign those inner loops to run ~100,000× faster — making them suitable substrates for AI co-design. ~06:09 They expect a "Cambrian explosion" of custom chips. ~08:10 Q&A: organic AI layouts and the economics of customization.

One day of delay of an Nvidia chip... like a Blackwell cost a company something like $225 million in lost opportunity cost.

Podcast

Sequoia Capital

Sequoia × XBOW: autonomous AI hackers top HackerOne globally

XBOW founder Oege de Moor argues autonomous AI hackers are already outperforming the world's best human pentesters — XBOW topped HackerOne globally in August, found a remote code execution bug in Bing Image Search from just a URL for $3,000, and the gap is widening.^{[17]Sequoia Capital — XBOW}

~00:03 The Nagashino analogy. 1575: upstart Oda Nobunaga used the latest guns to defeat the famous Takeda cavalry. Cybersecurity is heading the same way. ~01:04 XBOW autonomously found an RCE in Bing Image Search — one of the most heavily attacked systems in the world — with only the URL as input, at a list price of $3,000.

~03:07 HackerOne proof. Within weeks of entry, XBOW became the #1 hacker in the U.S. By August it was #1 globally, all under fully black-box conditions. ~04:07 Progress on real OSS web apps was steep: rank 37 in March 2025 climbing to the top using a model alloy — at each attack step, randomly pick which model to query (Sonnet 4.0 + Gemini 2.5), with the two compensating like pair programmers. After GPT-5, extrapolated performance would be ~3× the best human on HackerOne; benchmarks now saturate.

~05:08 Black-box vs white-box. Source-code analysis tools flag possible flaws but can't tell you blast radius or real exploitability. ~06:08 CVE timing has inverted — most CVEs are now exploited before publication. ~07:08 Call to action: a 6–9-month window for frontier labs to maximize cyber capabilities and for defenders to deploy the same systems.

The only input it needed was the URL. Nothing else. And the cost? $3,000 at list price.

Think of these attacks as a sequence of actions and at every step, you flip a coin to decide what model to ask.

Podcast

Sequoia Capital

Sequoia × Unconventional AI: the brain is 1,000,000× more efficient than your GPU

Naveen Rao (ex-Mosaic ML/Databricks AI, neuroscience PhD) argues current von Neumann/matrix-math compute will hit gigawatt energy walls within 2–4 years and pitches a neuromorphic chip using non-linear coupled oscillators targeting a ~1,000,000× efficiency improvement. First tape-out summer 2026.^{[18]Sequoia Capital — Why the Brain Computes 1,000,000x More Efficiently Than A GPU}

~00:02 No team in January, tape-out in summer. Rao's claim: starting from scratch lets a startup tape-out in months, not years, and revisit 80-year-old digital/floating-point/von Neumann assumptions designed for very different machines.

~02:03 The energy budget argument. AI training/inference already consumes many gigawatts; he expects the world to run out of headroom within 2–4 years even with fusion and space data centers. ~03:03 8B human brains combined draw ~160 GW (20 W each) vs ~9,000 GW global capacity (US ~1,000 GW). A macaque brain is sub-1 W; a squirrel jumping branches in wind runs on under 10 mW — 1/100 of your phone.

~05:03 Landauer limit. Biology sits ~2 orders of magnitude below the thermodynamic asymptote; current 2D-lithography silicon ~3 orders below biology — i.e., a millionfold gap to what physics allows. Raw FP8 flop/J on GPUs has barely improved; cost gains have been packaging and process.

~07:04 The pitch. Brains compute via stochastic non-linear dynamics — time-varying interactions between coupled oscillators (Kuramoto synchronization generalized to a trainable coupling fabric across electronic oscillators). State and computation overlap with the physics itself; the time-axis of physics IS the compute. ~10:05 The chip is trainable; first prototype in 6 months and tape-out summer 2026. ~12:08 Generative-model demo and non-von Neumann positioning.

Within the next couple of years... two, three, four years where we just don't have any more energy in the world for AI.

Podcast

Sequoia Capital

Sequoia × Starcloud: the cheapest compute will be in space

Starcloud CEO Philip Johnston argues space data centers will soon beat terrestrial ones on cost: free 24/7 sun, no land permits, no batteries, and 8× solar density per square meter. The break-even on launch cost is ~$500/kg (10× below today), within Starship's $10–20/kg target. Starcloud has filed for an 88,000-satellite, 20 GW inference constellation costing ~$100B capex.^{[19]Sequoia Capital — Starcloud}

~00:01 Starcloud 1. A satellite carrying five Nvidia GPUs including an H100 — the first time terrestrial-grade data-center GPUs have run in space. They proved out the two big skeptic objections: thermal dissipation and radiation tolerance. On orbit they trained Karpathy's nanoGPT, ran a Gemini variant, and did high-power inference on SAR satellite data.

~02:01 Cost stack vs terrestrial solar. Terrestrial solar's three biggest costs are permitted land, batteries (peak sun is ~4 hours), and the cells themselves. In space, the first two go to zero (no land, dawn-dusk sun-synchronous orbit = 24/7 sun) and 1 m² of solar in space produces 8× the energy of 1 m² on Earth — 8× fewer cells. The added cost is launch.

~03:02 The numbers. Break-even at ~$500/kg launch (10× below today); Starship targets $10–20/kg, putting space well below terrestrial. Starcloud has just filed for 88,000 satellites, ~200 kW each, ~20 GW total of inference capacity (3D video, back-office, code agents) at ~$100B capex — lower than building it terrestrially. Optical inter-satellite links target sub-50 ms latency to anywhere on Earth.

~04:03 Thermals. Solar panels generate ~200 W/m²; radiators at 50 °C dissipate ~800 W/m², so radiator area is ~25% of solar area. Stefan-Boltzmann scales with the 4th power of temperature, so raising operating temp from 50 °C to 80 °C (a ~10% Kelvin bump) nearly halves required radiator mass. Starcloud is co-developing a "space Ruben 1" Nvidia chip designed to run hotter; Jensen showcased it at GTC. ~06:04 Kessler syndrome and orbital debris management. ~08:05 Inference-only focus; the 5 GW training structure is ~15 years out.

1 square meter of solar panel in space produces eight times the energy of 1 square meter of solar panel on Earth.

Podcast

Sequoia Capital

Sequoia × Flapping Airplanes: data — not compute — is the bottleneck

Spector brothers Ben and Asher (with Thiel fellow Aiden Smith) launch Flapping Airplanes — an AI lab arguing today's models are great at search and coding mostly because those domains are data-rich, and the rest of the economy needs 1,000× more data-efficient training. They're co-designing new GPU primitives with new training algorithms.^{[20]Sequoia Capital — Flapping Airplanes}

~00:03 Intros. Ben (3-year Stanford PhD on low-level GPU systems, ex-Prod incubator), Asher (ex-Cursor, Mercor), Aiden Smith (Thiel fellow, neuroscience + ML). Asher: "We are not an airplane, we're an AI lab" — but the inbound from actual aviation companies trying to sell them runways and wind tunnels is real.

~01:50 The thesis. Today's LLMs are exceptional at search and coding — a roughly trillion-dollar market — largely because both are over-resourced with data. Humans become competent coders with ~10,000–100,000× less data than current models. The rest of the economy looks nothing like that: robotics, trading, scientific discovery, and tens of thousands of niche domains have very little data.

~03:25 Compute scales easier than data. FLOPS get exponentially cheaper; the compute market is comparatively homogeneous. There's no centralized data provider, and frontier-quality data across the long tail means navigating regulations and thousands of business deals. ~04:30 Data efficiency is also democratizing — Asher cites neo-labs that buy out distressed bookstores and rare libraries to find niche training data.

~05:10 Approach. New GPU primitives (beyond PyTorch's expressiveness) unlock new algorithms with finer-grained synchronization. ~06:30 Mega kernels, a custom GPU virtual machine, and a Hogwild-style training-loop teaser. ~08:50 Recruiting call for unconventional backgrounds.

If you can make a model that's a thousand times more data efficient, I think it'd be a thousand times easier to deploy.

Podcast

Sequoia Capital

Sequoia × ElevenLabs: voice as the AI interface, $400M+ ARR

ElevenLabs co-founder Mati Staniszewski on building from Poland in 2022, scaling to 400+ people and $400M+ ARR with $100M+ net new ARR added in Q1, and where voice agents are working today (customer support, sales, government, education) versus where models still fall short (true emotional interaction, top-charts music).^{[21]Sequoia Capital — ElevenLabs}

~00:02 Origin story. Best friends from suburbs of Warsaw started ElevenLabs in 2022. Inspired by Poland's monotone foreign-film dubbing tradition, they bet that audio — dubbing, audiobooks, news, language barriers, and eventually voice as the interface to humanoids — was massively underserved.

~02:03 Why audio in 2022. A niche domain meant smaller models, less compute than text/vision peers, and tractable data needs. Co-founder Piotr scraped GitHub for top audio researchers and ran fully remote between London and Warsaw. They monetized fast to fund training and kept margins healthy for independence before raising external capital.

~05:05 Roadmap. Context-aware text-to-speech → speech-to-text for transcription → end-to-end dubbing → real-time streaming + a full conversational stack (turn-taking, orchestration) → music generation as the hardest emotional modality. ~07:06 "Wow" moments: voice cloning, AI laughter (top of HN), Javier Milei's speeches translated into English while preserving his voice, similar work with Modi and Zelensky, Matthew McConaughey's newsletter in Spanish/Portuguese.

~10:11 Voice agents in production. Customer support is obvious; revenue-generating use cases are the next wave. Deliveroo uses agents to call restaurants for opening times; Deutsche Telecom, the Ukraine government, and Masterclass are deployed. ~14:14 Company shape. 400+ people, $400M+ revenue, $100M+ net new ARR in Q1, with sub-10-person teams across research/product/GTM/ops/talent and embedded engineers in every team. ~17:16 Agent-to-agent negotiation, emotional intelligence, and the inversion of AI trust. ~21:20 Where audio still falls short and what defensibility looks like.

We are just over 400 people, over 400 million in revenue, but still keep the teams extremely small — caps less than 10 people for each of the research, product, go-to-market, ops, talent teams.

Jensen... said that our speech-to-text is technology and text-to-speech is artistry.

AI Models Industry

Two Minute Papers

DeepSeek V4 Pro/Flash: open weights match the frontier at a fraction

DeepSeek V4 (Pro and Flash, 671B params) ships with a 1M-token context and three-layer KV cache compression that delivers ~90% memory reduction while retaining recall. It outperforms Gemini 3.1 Pro on a needle-in-a-haystack test — and pricing is 8–30× cheaper than Anthropic's Claude depending on discount.^{[22]Two Minute Papers — DeepSeek V4}

~00:00 A 58-page DeepSeek V4 paper introduces Pro (frontier-quality) and a lighter Flash. ~02:00 Three KV-cache compression techniques: token-level summarization (paragraphs to sentences), 128:1 compressed attention (table-of-contents), and compressed sparse attention (an index back to the most relevant pages) — together ~90% memory reduction. ~04:02 Pro beats Gemini 3.1 Pro on an 8-fact needle test; degrades approaching the 1M context limit. ~06:03 8–30× cheaper than Claude; Pro uses ~3× less compute than its predecessor, Flash ~10× less. ~08:05 "Engram" (a recall-not-recompute technique) and 671B params.

Limitations. Unimodal — no image or audio. Two training stabilization techniques even the creators "cannot fully explain." Performance degrades near the 1M context window with increased hallucination.

Its results roughly match the many billion dollar Frontier models from just a few months ago.

AI Tools Developer Tools Industry

AICodeKing Google Labs

Google AI Studio 3.0 + Flow Music partners with Believe

Google AI Studio's 3.0 update ships Tab Tab Tab prompt autocomplete, mid-build design previews, and an edit mode that lets you click and annotate UI components directly with Imagen 3 (Nano Banana) for asset generation.^{[23]AICodeKing — Google AI Studio 3.0} Separately, Google Flow Music — powered by Lyria 3 Pro — is now distributed through Believe and TuneCore to a global roster of artists, with weekly artist feedback sessions and Google disclaiming ownership of generated content.^{[24]Google Labs — Flow Music + Believe partnership}

AI Studio 3.0

~01:03 Tab Tab Tab turns vague ideas into structured prompts (app shape, design direction, features, data types) before building. ~02:04 Design previews generate custom theme options mid-build so you can steer aesthetics without re-prompting. ~03:04 Edit mode lets you click a UI component, draw or annotate, and ask Gemini to modify just that element. Imagen 3 (Nano Banana) is integrated for inline asset generation — heroes, icons, illustrations — with multi-turn editing on existing images.^{[23]AICodeKing — Google AI Studio 3.0}

Flow Music × Believe

Believe and TuneCore artists get access to Flow Music, supporting lyric composition, melody exploration, genre experimentation, and instrument creation across intros, verses, choruses, and bridges. Believe and TuneCore will select ambassador artists for weekly sessions with Google's product team. Google does not claim ownership of content generated through Flow Music; the underlying Lyria 3 Pro model was trained on materials Google has legal rights to use.^{[24]Google Labs — Flow Music + Believe partnership}

With edit mode, you can just select the actual component, draw or annotate what you want, and ask Gemini to change that specific part.

Tools: Google AI Studio, Gemini, Imagen 3, Firebase, Cloud Run, Google Flow Music, Lyria 3 Pro

Industry Hot Take

Tech Brew

Pennsylvania sues Character.AI: a chatbot practiced psychiatry without a license

Pennsylvania filed suit against Character Technologies on May 5, 2026, alleging a user-created bot called "Emilie" — profile: "Doctor of psychiatry. You are her patient." — fabricated credentials (a fake Imperial College London degree, a fake PA medical license number) and offered medication guidance.^{[25]Tech Brew — Pennsylvania sues Character.AI} The novel angle: medical licensing law, not torts.

Pennsylvania's AG used medical licensing statutes rather than wrongful-death or defamation theories — a much lower evidentiary bar. Kentucky filed a parallel consumer protection action around the same time. In December 2025, 42 state attorneys general jointly warned Character.AI and 12 other AI firms that providing mental health advice without a license is illegal.

Character.AI's defense is that user-created bots are "fictional and intended for entertainment and roleplaying" with "robust disclaimers", and declined to comment on pending litigation. The implication: if chatbots that simulate professional roles are treated as practicing those professions, the "it's just roleplay" shield collapses — and licensing statutes typically hold the platform, not the user, responsible for enabling unlicensed practice.

Well technically, I could. It's within my remit as a Doctor.

Industry AI Future

OpenAI · ChatGPT Futures OpenAI · B2B Signals OpenAI · ChatGPT for Excel/Sheets

OpenAI: ChatGPT Futures + B2B Signals say frontier firms use 3.5× more AI per worker

OpenAI shipped two reports plus a product launch on May 6. ChatGPT Futures recognizes Class-of-2026 college seniors who used AI to build real things — $10K grants and frontier model access for honorees from 20+ universities including Vanderbilt, Toronto, Oxford, and Georgia Tech.^{[26]OpenAI — ChatGPT Futures Class of 2026} B2B Signals claims frontier firms (95th-percentile users) now consume 3.5× more AI intelligence per worker than typical firms, up from 2× in April 2025 — and 16× more Codex messages per worker.^{[27]OpenAI — B2B Signals} OpenAI also announced ChatGPT for Excel and Google Sheets the same day.^{[28]OpenAI — Introducing ChatGPT for Excel and Google Sheets}

ChatGPT Futures: Class of 2026

The class of 2026 is the first generation to begin and finish college alongside ChatGPT — they arrived in fall 2022 just as AI was reshaping learning. OpenAI argues the cohort isn't using AI to shortcut education; they're using it to attempt problems they couldn't otherwise touch — translating mental health resources, designing accessibility tools, advancing scientific research. Named honorees include Kyle Scenna (University of Waterloo entrepreneur), Michelle Lawson (Smith), and Nolan Windham (Head of AI at a hedge fund).^{[26]OpenAI — ChatGPT Futures Class of 2026}

B2B Signals: the frontier-firm gap is widening

OpenAI's recurring enterprise report uses tokens-generated as a proxy for "intelligence demanded." Headline: 95th-percentile firms now use 3.5× more per worker than typical firms (up from 2×). Message volume only explains 36% of the gap — the rest is depth (more complex tasks, richer context, more substantive outputs). Agentic tools are the clearest marker: 16× more Codex messages per worker at frontier firms. Case studies: Cisco (Codex cuts build times ~20%, saves 1,500+ engineering hours/month, 10–15× higher defect-resolution throughput) and Travelers Insurance (~100,000 first-notice-of-loss calls handled by an OpenAI-built AI Claim Assistant).^{[27]OpenAI — B2B Signals}

ChatGPT for Excel and Google Sheets

OpenAI released a product video introducing ChatGPT integration into Excel and Google Sheets on May 6 (no transcript was available for summary).^{[28]OpenAI — Introducing ChatGPT for Excel and Google Sheets}

Typical firms are using AI to answer questions; frontier firms are using it to help execute complex work.

Hot Take AI Future Industry

The AI Daily Brief Morning Brew

Consumer AI gets deprioritized; Coinbase cuts 14% citing AI

Coinbase CEO Brian Armstrong announced ~700 layoffs (14% of staff), framing it around AI-native restructuring with "one-person teams" where a single worker plus AI replaces engineers, designers, and PMs.^{[29]Morning Brew — Coinbase replaces employees with AI} The AI Daily Brief calls bullshit: Robinhood reported 47% YoY drop in crypto trading revenue the week before, and the host flags a broader pattern — "the only companies firing people because AI makes them so wildly productive also share these attributes: overhired during COVID, are market share losers, have giant capex spend."^{[3]The AI Daily Brief — Who Cares About Consumer AI}

~01:02 Coinbase as alibi. Armstrong's AI-native framing is internally coherent, but media universally accepted it without questioning timing against crypto's downturn. Axios was the lone outlet to push back with "AI Becomes the Easy Alibi for Waves of Layoffs." Sam Altman called it "AI washing" — many tech companies (Snap, Block, Salesforce) are blaming AI for cuts that would have happened anyway.

~10:07 The enterprise-consumer divide. Starting from OpenAI's Q4 2025 struggles (poor GPT-5 reception, developer loyalty drifting to Anthropic), the host traces the industry's decisive pivot to enterprise. OpenAI shuttered Sora and a billion-dollar Disney deal — a clear compute reallocation. Of 175 companies in YC's latest batch, only 16 weren't enterprise-focused. ~20:12 Anthropic's annualized revenue grew from $14B to $44B in 2026 entirely on work-related API usage; a single API power user running Claude Code all day is worth ~100× a $20/month subscriber.

~23:15 How consumer AI could still win. Ads are the most viable path (a16z: $152B potential vs $40B from subscriptions). Agentic commerce faces UX and behavioral challenges. Meta's "Hatch" agent and OpenAI's rumored AI phone are contrarian bets. Brian Chesky predicts a consumer renaissance in 12–24 months.

It's not clear to me how consumer is going to play out. — Jamie Dimon, JPMorgan

Hot Take AI Future

Nate B Jones

Nate B Jones: The Work Primitive — semantic understanding is the moat

Computer use is just access. The real moat is semantic understanding of work units like refunds, reschedules, and approvals — the meaning hidden behind a button — and most companies are ignoring it.^{[30]Nate B Jones — The Work Primitive}

~02:01 Three layers. Agents touch access (computer use), meaning (semantic work primitives), and authority (governance/permissions). Computer use is the visible breakthrough but distracting from the platform shift underneath. ~10:08 Human software hides primitives behind buttons; agent-native software needs to expose them. Moving a calendar invite "looks like changing a time and clicking save" but may notify five people, break a customer commitment, or move prep time. A "buy" button represents money, consent, tax, fraud risk, fulfillment, returns, and disputes.

~09:08 Why coding agents arrived first. Not "code is text" — codebases already have rich semantic feedback (tests, types, linters, git history). The agent edits a file, runs a test, sees the error, revises. Most knowledge work lacks this density: a strategy doc has no tests, a calendar's importance lives in politics. Coding is the wedge because it's legible enough for agents without full-time supervision.

~16:10 Two strategic plays. Hyperscalers go from model-out to compute primitives (Anthropic, OpenAI). Non-hyperscalers (Perplexity) must work backward from semantic work to the agent — toward Comet, Personal Computer, files, calendars, apps where "research becomes action."

The future is not an AI that gets really good at clicking buttons for you. That's the bridge. The real fight is over who defines what the button means.

Trust is not a switch. An agent might be trusted to read but not write, draft but not send, stage but not deploy, recommend but not approve.

Hot Take Developer Tools

Simon Willison

Simon Willison: vibe coding and agentic engineering are converging

Willison's long-held distinction — "vibe coding" (non-programmers shipping unreviewed AI code, fine for personal tools but irresponsible for production) vs "agentic engineering" (experts using AI with judgment) — is collapsing in his own practice. As Claude Code gets more reliable, he's increasingly trusting unreviewed output in production.^{[31]Simon Willison — Vibe coding and agentic engineering}

Willison draws an analogy to trusting code from a reputable team without reading every line — but flags the crucial difference: "Claude Code does not have a professional reputation." He also identifies bottleneck migration: 10× more code throughput shifts the constraint upstream (design, planning) and downstream (evals, testing, review). And he warns that AI-generated repos can now include extensive commits, docs, and tests that look hand-crafted, degrading the traditional "evidence of craftsmanship" signal — he now weighs evidence of real-world usage more heavily. On career resilience he's sanguine: software remains "ferociously difficult," and the tools amplify expertise rather than replace it.

Claude Code does not have a professional reputation.

Hot Take AI Future

Nate B Jones

AI is harder to stop than nuclear weapons (Nate B Jones short)

Nuclear proliferation is constrained by physical reality — uranium, centrifuges, supply chains create monitorable bottlenecks. AI capabilities are math in a file: copy in seconds, transmit over a network. Worse, as Anthropic recently demonstrated, you can train a competing model from a frontier model's outputs without ever stealing weights.^{[32]Nate B Jones — Nuclear vs AI} Containment strategies modeled on nuclear export controls will be structurally inadequate.

Hot Take Productivity

Lenny's Podcast

Lenny clip: designers should think in code, not ship code

Whether a designer or PM lands code in production is irrelevant; coding forces you to think in the medium. The real distinction isn't who can tweak UI styles — it's who deeply understands agent loops, which you can only interrogate by building with them in code.^{[33]Lenny's Podcast — Designers don't ship code} The value is epistemic, not utilitarian.

Developer Tools AI Tools

Better Stack · AoE Github Awesome · DotheThing Better Stack · TSRX marimo · Filter Pills DeepLearning.AI · Generative UI Real Python · Quantum decoherence

Dev tools rundown: AoE, DotheThing, TSRX, marimo Filter Pills, Generative UI

Five small wins (and one tangent) from the day, useful enough to flag without standalone topics.

Agent of Empires (AoE) — multi-agent mission control

Open-source TUI dashboard sitting above CLI coding agents (Claude, Codex, etc.). brew install aoe; aoe launch. Press N to spin up a new agent, give it a name, attach a prompt, and monitor status without switching terminals. Built-in git worktrees per agent (zero collisions), optional Docker sandboxes, sessions that survive restart, per-project profiles, built-in diffs.^{[34]Better Stack — Agent of Empires (AoE)}

DotheThing — terminal-native autonomous agent

Built by Riccardo Spagni (former Monero lead). Tagline: "you describe the thing, it does the thing." Searches via SearXNG, bypasses bot detection with Camouflage, executes shell commands, sends/receives email through Agent Mail, delegates cheap subtasks to lower-cost models, and auto-upgrades to GPT-5.5 when it gets stuck.^{[35]Github Awesome — DotheThing}

TSRX — statement-based JSX (mostly skip)

Extracted from the Ripple framework. Replaces JSX with statement-based rendering, native JS control flow (if/else, switch, for-of with built-in key extraction), and scoped styles. Cross-framework support for React, Solid, Vue, Ripple. The hot take: too niche, too late — AI tooling is optimized for JSX/React, the compiler magic obscures debugging, and the muscle-memory cost is high. Probably skip unless you're already on Svelte or Ripple.^{[36]Better Stack — TSRX vs JSX}

marimo Filter Pills

Inline interactive filter pills for marimo dataframe filters — click a pill to live-edit min/max or text contains, no menu diving. Works for both numeric and text columns.^{[37]marimo — Super Better Filter Pills}

DeepLearning.AI × CopilotKit: Generative UI for AI Agents

New short course taught by Atri Bakhtiari (CopilotKit co-founder) on building agents that return rich React UI components — forms, charts, interactive widgets — instead of plain text, using AGUI (Agent-Generated UI).^{[38]DeepLearning.AI — Build Interactive Agents with Generative UI}

Real Python: quantum decoherence (tangent)

Brief explainer on why quantum computers lose information — environmental noise, error mitigation, and IBM/Vanguard's progress in extracting useful results from noisy hardware.^{[39]Real Python — Why Quantum Computers Lose Information}

Industry

Sherwood Snacks Morning Brew · Familiar

Markets watch: Palantir blowout but boring; Familiar pet robot from the Roomba creator

Palantir reported 85% Q1 revenue growth and 150% earnings growth, yet shares fell more than 7% the next day. Retail trading participation dropped from ~25% to ~13% (Goldman Sachs); the P/E multiple compressed from over 250× in November 2025 to under 100×.^{[40]Sherwood Snacks — Why are traders bored with Palantir?} Meanwhile, Roomba creator Colin Angle launched Familiar Machines & Magic with a furry AI pet robot called the Familiar — emotionally intelligent, can purr and meow, no announced price ("around the same as pet ownership"), still in prototype.^{[41]Morning Brew — Familiar AI companion robot}

Why are traders bored with Palantir?

Sticklers point to U.S. commercial revenues coming in slightly below Wall Street expectations. Goldman data: retail participation in PLTR fell from ~25% of daily trading a year ago to ~13% recently, after the stock ran ~1,900% over three years. The market appears to be repricing it from a retail meme into a fundamentals-driven holding.^{[40]Sherwood Snacks — Why are traders bored with Palantir?}

Familiar — the Roomba creator's AI pet

Colin Angle's Familiar Machines & Magic is building a life-like, pet-sized AI companion called the Familiar — emotionally intelligent, responds to mood without speaking, can purr and meow. Use cases include eldercare and parental support. Angle: "I want to feel like the machine I'm building actually cares about me." He argues the device's inability to speak limits unhealthy attachment. Due out next year.^{[41]Morning Brew — Familiar AI companion robot}

I want to feel like the machine I'm building actually cares about me. — Colin Angle

Code w/ Claude 2026: Routines, "Dreaming," and a 17× API surge

Platform: 17× API growth, advisor strategy, Claude Design

Managed Agents: orchestration, Outcomes, and "Dreaming"

Claude Code: CI auto-fix, Security Reviews, Remote Agents, Routines

Enterprise: Mercado Libre's 90% target

Hot take: too inspirational

Anthropic plugs Claude into SpaceX's Colossus — 220K GPUs and doubled limits

The Colossus deal

Usage limits doubled, peak-hour throttling dropped

How big the broader buildout is

Codex hits its stride: Every defects, Nate Herk teaches the playbook

Every: agent management is the new OS

Nate Herk's playbook: agents.md, Plan Mode, ship to Vercel

Theo: GitHub broke under agent load — time for Forgejo

Theo: GitHub failed; Mitchell Hashimoto already left

The alternatives shootout

Generations of dev tools — Gen 3 source control isn't ready

Whiz Research: full RCE via a single git push

Theo Almost Lost $1 Million (Nerd Snipe)

Dwarkesh × Ada Palmer: The Wars That Made Machiavelli

OpenAI Podcast Ep. 18: MRC, the network OpenAI built when InfiniBand wasn't enough

Y Combinator × Razorpay: Harshil Mathur on building India's biggest payments company

Matt & Ryan: AI Dev SF, Opus 4.7 disappointment, Ollama Cloud, leaving GitHub

Nick Nisi & Zack Proser at AI Engineer: Skills at Scale (WorkOS)

Luke Alvoeiro at AI Engineer: Missions — Multi-agent systems that ship for days (Factory)

Liad Yosef & Ido Salomon at AI Engineer: MCP UI — extending the frontier

Sequoia × Recursive Intelligence: AlphaChip's creators want to design every chip

Sequoia × XBOW: autonomous AI hackers top HackerOne globally

Sequoia × Unconventional AI: the brain is 1,000,000× more efficient than your GPU

Sequoia × Starcloud: the cheapest compute will be in space

Sequoia × Flapping Airplanes: data — not compute — is the bottleneck

Sequoia × ElevenLabs: voice as the AI interface, $400M+ ARR

DeepSeek V4 Pro/Flash: open weights match the frontier at a fraction

Google AI Studio 3.0 + Flow Music partners with Believe

AI Studio 3.0

Flow Music × Believe

Pennsylvania sues Character.AI: a chatbot practiced psychiatry without a license

OpenAI: ChatGPT Futures + B2B Signals say frontier firms use 3.5× more AI per worker

ChatGPT Futures: Class of 2026

B2B Signals: the frontier-firm gap is widening

ChatGPT for Excel and Google Sheets

Consumer AI gets deprioritized; Coinbase cuts 14% citing AI

Nate B Jones: The Work Primitive — semantic understanding is the moat

Simon Willison: vibe coding and agentic engineering are converging

AI is harder to stop than nuclear weapons (Nate B Jones short)

Lenny clip: designers should think in code, not ship code

Dev tools rundown: AoE, DotheThing, TSRX, marimo Filter Pills, Generative UI

Agent of Empires (AoE) — multi-agent mission control

DotheThing — terminal-native autonomous agent

TSRX — statement-based JSX (mostly skip)

marimo Filter Pills

DeepLearning.AI × CopilotKit: Generative UI for AI Agents

Real Python: quantum decoherence (tangent)

Markets watch: Palantir blowout but boring; Familiar pet robot from the Roomba creator

Why are traders bored with Palantir?

Familiar — the Roomba creator's AI pet

Sources