Plugins eat the prompt; voice AI still isn't Her

AI Tools Hot Take

The Harness Mental Model: Prompts, Skills, Plugins, MCPs, Hooks

Nate B Jones spends 28 minutes drawing the cleanest map I've seen of what actually sits between an LLM and useful work in 2026.^{[1]Nate B Jones — You're Wasting 40% Of Your AI Time On Something Fixable} His thesis: most people over-index on prompting, and the way to get 10× more done is to understand the harness layer — prompts, skills, plugins, MCPs, hooks, scripts — as Lego bricks, with the plugin as the natural unit of packaged work. If you're a non-engineer being told by your team "this should be a skill, not a prompt" and you have no idea what that means, this is the explainer to send your CEO.

The decision tree

Jones walks through each primitive in order: ~04:00 prompts are for one-offs; ~05:00 skills are repeatable processes encoded as markdown ("just a clear markdown document that describes in good detail how you do that work"); ~08:00 plugins wrap skills + MCPs + hooks + assets + commands into one installable workflow; ~10:00 MCPs and connectors are how agents reach live data; ~12:00 hooks and scripts handle the deterministic parts you should never trust the model with (formatters, schema validators, tests).^{[1]Nate B Jones — You're Wasting 40% Of Your AI Time On Something Fixable}

If you do it once, it's a prompt. If you do it repeatedly, it's a skill. If the workflow needs to travel or other people need to install it, if it needs tools or assets or connectors — guess what, it's a plugin. If it needs access to another system, it's an MCP. If the workflow has to be verified, you got to add a check. — Nate B Jones

The plugin is the right unit of work

Jones's strongest claim: the App Store analogy for plugins is too small. ~14:00 Don't ask "what plugins can I install?" — ask "what part of my work has enough repeatable structure that the agent should be able to inherit it?" He argues finding good plugin boundaries — drawing edges around a workflow that's a clean unit of value — is a high-paying skill in 2026 that very few people have. Customer service isn't one plugin; it's three or four (refunds, activations, upgrades), each with sharp boundaries.^{[1]Nate B Jones — You're Wasting 40% Of Your AI Time On Something Fixable}

Why this is a non-engineer's job

~21:00 Jones makes the case that in 2026 — unlike 2025 — building plugins is a non-engineer activity, because the domain knowledge of what good looks like sits with the person doing the work, not the person who can write code. He cites an editorial worker who built a plugin for first-pass editorial review (three reading passes, comment placement, factual flags) without coding skills. The dig at hyperscalers: stop waiting for Claude or ChatGPT to launch your dream plugin — they won't.^{[1]Nate B Jones — You're Wasting 40% Of Your AI Time On Something Fixable}

The Claude Design tell

~20:00 Jones reads Claude Design (launched two weeks earlier) as "a fancy plugin with a UI for Claude for design" — meaning the plugin pattern was so important Anthropic made it a product. Take the hint: scaffolding wins.

Tools: Claude Code, Codex CLI, Claude Design, MCP servers, Figma, Salesforce connectors

AI Tools Developer Tools

Nate Herk

Printing Press: CLI Factory That Beats APIs and MCPs by 35×

Nate Herk demos Printing Press, a new CLI factory from OpenClaw creator Peter Steinberger that turns any API (or screen-scraped site) into an agent-friendly CLI in roughly 10 minutes.^{[2]Nate Herk — This is The Most Powerful Tool to Give to Claude Code} Herk's benchmark: MCP burns 35× more tokens than an equivalent CLI on the same task and reliability drops from 100% to 72% as task difficulty grows. The pitch is that the same agent that called Skool (no public API) only paid 2,000 context tokens for a 132,000-token underlying response — because the CLI summarized before returning.

Why APIs and MCPs both lose for agents

~03:00 Herk's framing: APIs were built for code, MCPs were built for tools, CLIs were built for agents. APIs dump raw JSON; MCPs balloon context with tool descriptions you may never invoke; CLIs return short pre-formatted text and use local SQLite mirrors so there's no round-trip or rate limit per call.^{[2]Nate Herk — This is The Most Powerful Tool to Give to Claude Code}

What's in the library

Pre-built CLIs include ESPN, Flight Goat, Movie Goat, Recipe Goat, Linear, Amazon, Craigslist, eBay, TikTok Shops, Shopify, Hacker News, and a "Contact Goat" that does verified-email lookups via LinkedIn + happenstance cross-checks. ~06:00 Steinberger built his own GOG CLI to replace Google's official GWS CLI because the official one was bad — that itch became the whole product.^{[2]Nate Herk — This is The Most Powerful Tool to Give to Claude Code}

The factory part

~10:00 Building a custom CLI is a natural-language prompt: "use the CLI factory to make a Hacker News CLI." It outputs a Go binary plus a Claude skill so cloud code can invoke it with plain English. Requires `go` to be installed (one-line ask to Claude). Auth quotas still apply — wrapping an API in a CLI doesn't dodge rate limits, just context bloat.^{[2]Nate Herk — This is The Most Powerful Tool to Give to Claude Code}

Tools: Printing Press, Claude Code, GOG CLI, GWS CLI, Skool, Hacker News, OpenClaw

AI Tools Developer Tools

AI Jason

Goal-Mode Coding: Codex /goal, Hermes persist, and Mission Loops

AI Jason traces the evolution of agent loops from Ralph-loop's "for-loop the coding agent" trick to OpenAI's new /goal command in Codex and Hermes Agent's persist feature — both of which insert an LLM-as-judge between iterations so agents stop declaring premature victory.^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...} He reports running Codex with /goal for 9 hours overnight on a real JS→TS migration and shares the prompt patterns that actually keep agents honest about "done." Beyond coding, his team has been experimenting with multi-day "missions" — Twitter growth, ad-spend optimization — that schedule their own next run.

Why dumb for-loops fail and judge-loops work

~02:00 Original Ralph-loop just re-prompts the agent in a `while` loop up to a max iteration count — dumb but effective for things like "fix all failing tests." /goal upgrades that with an explicit "definition of done" LLM call after each iteration. If the judge says incomplete, the agent gets a continuation prompt that lists the goal file and pushes for "next concrete steps." Codex's continuation prompt explicitly bans proxy completion signals: "only marks a goal achieved when the audit shows the objective has actually been achieved."^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...}

Activation and best-practice prompt shape

~04:00 Run codex features list, then codex features enable goal, then inside Codex use /goal <objective>. AI Jason's example: "migrate my codebase from JavaScript to TypeScript and make sure all screens stay visually identical, using Playwright interactive to verify the output." Good goal prompts are bigger than one prompt and smaller than an open-ended backlog. Include: what to achieve, what NOT to change, how to validate, when to stop.^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...}

Vincent (OpenClaw) field notes

~07:00 Vincent has been running /goal on OpenClaw for three days across 30 rounds. His learnings: interview the agent before kicking off (project context, what bad looks like, kinds of bugs you keep hitting); quantify done ("once you find 20 discrete new issues" beats "until everything is fixed"); for net-new projects, list anti-pattern files and expected user behavior.^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...}

Goalbody: scaffolding the goal prompt itself

~08:30 Open-source npx goalbody drops a /goal-prep skill that interviews you to construct the goal.md file plus a state.yml task list. Use /goal goal.md instead of inline prompt — Codex updates state.yml on each loop.^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...}

Missions: when hours aren't enough

~11:00 /goal is built for hours-long coding sessions with verifiable end states; it falls down on multi-week non-deterministic objectives like growing a Twitter following. AI Jason's team is testing "missions" — mission.md plus scheduled re-runs (hours/days/weeks apart) with persisted artifacts and human-in-the-loop escape hatches. Early result on a "grow to 10K followers" mission: agent observed first tweet performance, switched to founder-voice + thread format, second tweet beat baseline.^{[3]AI Jason — Ralph-loop 2.0? The real autonomous coder is coming...}

Tools: Codex CLI /goal, Hermes Agent persist, Ralph-loop, Goalbody, Playwright, OpenClaw, AutoResearch

AI Tools Developer Tools

Better Stack

Archon: YAML DAGs and Git Worktrees for Deterministic Agents

Better Stack walks through Archon, an open-source local agent harness that fixes the "same prompt, different output" problem with three primitives: YAML DAG workflows, per-run git worktrees so parallel agents never collide, and auto-loaded skills.^{[4]Better Stack — AI Agents Are Random… This Fix Makes Them Deterministic} Demoed on an M4 Pro running locally — no cloud — with a UI that shows exactly which step in the YAML pipeline broke when something fails.

The three changes

~02:00 (1) YAML DAGs as a checklist the agent must follow — some steps are AI, some are fixed scripts; (2) every run gets its own git worktree, so multiple agents can run in parallel without merge conflicts; (3) skills auto-load context per workflow instead of you stuffing it into prompts.^{[4]Better Stack — AI Agents Are Random… This Fix Makes Them Deterministic}

How it compares

Better Stack positions Archon against LangChain ("great, but built for general bots, not code") and one-off scripts ("not reusable, not versioned, not discoverable"). The framing matches the broader harness thread today — defined process beats prompt-tweaking.^{[4]Better Stack — AI Agents Are Random… This Fix Makes Them Deterministic}

Tools: Archon, Claude Code, Cursor, Codex, LangChain, git worktrees

AI Tools Industry

Better Stack

Symphony Explained: OpenAI's "Build Your Own Harness" Spec

Better Stack does the deep dive on OpenAI's Symphony — the open-source orchestrator OpenAI released last month — and frames it as the strangest install process in OSS history: instead of cloning a repo, you hand your coding agent a 2,000-line spec file and tell it to build Symphony from scratch.^{[5]Better Stack — Why OpenAI Built Symphony and Gave It Away for Free} The result: no two installs look alike. Better Stack got a Python version; someone else built a Go/charm CLI version; someone else built one on the Claude SDK.

Why Symphony exists

~00:30 OpenAI hit a "fast agents, slow humans" bottleneck — engineers could only supervise 3–5 concurrent Codex sessions before context-switching killed productivity. Symphony's fix: humans put tickets in Linear, Symphony polls for "to do" status, spins up a Codex worker per task, and only re-involves the human at review time.^{[5]Better Stack — Why OpenAI Built Symphony and Gave It Away for Free}

The build-it-yourself install

~03:00 Better Stack's read on the spec-not-repo distribution model: "wild because if everyone went down this route, no two versions of Symphony would look the same — chaos for OpenAI to maintain. But it's also kind of genius because if you built your own version, you'd feel responsible for it." That's exactly the play — OpenAI ships the spec, the community ships the implementations.^{[5]Better Stack — Why OpenAI Built Symphony and Gave It Away for Free}

Workflow hooks for real projects

~04:00 Out of the box, Symphony writes to a local workspace dir per issue. Real-world usage requires a create_after hook (clone the repo, branch) and a run_after hook (stage, commit, push, open PR). Symphony is "the pip harness" — Maltego and Conductor remain the closed-source full-featured options.^{[5]Better Stack — Why OpenAI Built Symphony and Gave It Away for Free}

Tools: Symphony, Codex CLI, Linear API, Maltego, Conductor, charm CLI, uv

AI Tools

AICodeKing

Verdant Manager: AI CTO With Slack/Telegram and Long-Term Memory

AICodeKing reviews Verdant's new Manager feature, which positions itself as an "AI CTO" sitting above the normal coding-agent loop.^{[6]AICodeKing — Crazy Auto FULLY FREE AI Coder} You hand it an outcome ("build a waitlist app with landing page, email capture, admin view, deployment"); Manager decomposes into phases, dispatches workers in parallel onto a board, and reports back. Bonus: it remembers your stack preferences across projects, plugs into Slack/Telegram, and ships with both Eco mode (cheap models for iteration) and BYOK (your Anthropic/OpenAI/OpenRouter keys).

Where Manager actually beats raw Claude Code

~03:00 The decomposition itself: "build a baby-product recommendation app" gets split into requirements → UI structure → recommendation logic → email capture → validation → testing, with workers dispatched in parallel rather than one sequential conversation. ~04:00 Long-term memory means stack preferences (TypeScript + Tailwind + Supabase + Vercel, tests-before-deploy) get applied automatically — no more "explain your stack again on conversation three."^{[6]AICodeKing — Crazy Auto FULLY FREE AI Coder}

The Slack/Telegram angle

~05:00 Manager exposes itself as a chatbot in Slack/Telegram so you can dispatch work from a meeting or your phone: "deploy the latest version to staging and report back." The reviewer's caveat: yes, you'd still review big changes — but for landing pages, internal tools, PR summaries, and prototypes, the "send work where you are" UX is the actual unlock.^{[6]AICodeKing — Crazy Auto FULLY FREE AI Coder}

Cost controls

Eco mode swaps in cheaper models for long iterative sessions; BYOK lets power users route through their own provider keys (advanced features sometimes don't support BYOK). The combination is the real story — high-quality models when you need them, eco for exploration.^{[6]AICodeKing — Crazy Auto FULLY FREE AI Coder}

Tools: Verdant Manager, Slack, Telegram, Anthropic API, OpenRouter, Vercel, Supabase

AI Models Industry

AI Engineer

Voice AI Day at AI Engineer: Why TTS Now Looks Like an LLM

Mistral AI scientist Samuel Humeau presents the architectural convergence in text-to-speech: every serious TTS system in 2026 is now an autoregressive decoder backbone over 80ms audio frames, encoded via a learned codec that compresses ~200 kbps audio down to ~500 tokens/sec.^{[7]AI Engineer — Why TTS Models Now Look Like LLMs — Samuel Humeau, Mistral} He drops Mistral's first open-source TTS model with 17ms first-audio latency on a single GPU and live-clones his own voice mid-talk from a 10-second sample.

The standard architecture

~09:00 Encoder turns audio into tokens (12 frames/sec × ~37 tokens per frame = ~500 tokens/sec for Mistral); autoregressive backbone (4B params) predicts one frame's worth of tokens at a time; a small "dev transformer" recomputes all 37 tokens of that frame in parallel. Mistral specifically uses flow matching (diffusion-style) for the inner block rather than vanilla autoregressive.^{[7]AI Engineer — Why TTS Models Now Look Like LLMs}

Latency math

~15:00 For agent use the actual UX win isn't the total generate time — it's first-audio-packet latency, because you start playing while you generate. 17ms TTFA on a single GPU lets the perceived latency stay below 200ms even for long-form output, especially when the LLM is also streaming text in.^{[7]AI Engineer — Why TTS Models Now Look Like LLMs}

The voice cloning asterisk

~18:00 Mistral released the model weights open-source but not the encoder used to extract a new voice fingerprint — they kept that proprietary specifically to not hand out arbitrary-voice cloning. Mistral hints this is a temporary stance and that "vocal identity" is becoming the next branding asset, like a website style guide.^{[7]AI Engineer — Why TTS Models Now Look Like LLMs}

Tools: Mistral TTS (open-source), flow matching, audio codecs, voice cloning

AI Models Hot Take

AI Engineer

Gradium's Phonon: On-Device TTS, and Why "Her" Is Still Far Off

Neil Zeghidour, CEO of Gradium AI (the for-profit arm of the Moshi lab funded by Eric Schmidt, Xavier Niel, and Rodolphe Saadé), gives the most honest "where are we vs. Her" talk of the conference.^{[8]AI Engineer — Voice AI: when is the "Her" moment? — Neil Zeghidour, Gradium AI} His thesis: TTS latency is now a distraction; the new bottlenecks are tool calls (500ms–4s) and the fact that every speech-to-speech model except Moshi is still half-duplex. Gradium also ships Phonon, an under-100M-param TTS model that runs on a smartphone CPU — voice without the API bill.

Half-duplex is the real ceiling

~09:00 Even OpenAI Advanced Voice and Cerebras's voice model are half-duplex — the model is either listening or speaking, never both. Human conversation has up to 20% overlap (back-channeling, "mhm," coughs), especially in Japanese where back-channeling is politeness. Zeghidour demos a half-duplex model being passive-aggressively over-polite: every interruption gets a "sorry, please continue" loop.^{[8]AI Engineer — when is the "Her" moment?}

Moshi vs. modern speech-to-speech

~11:00 Moshi (2 years old now) remains the only full-duplex production-ish model — partner Alex's demo of plotting a course to "Sirius 22" shows the AI starting to answer before the question ends, while still hearing follow-ups. But Moshi has no tool calls, no observability, no paralinguistic understanding — useless for production. The takeaway: you can have natural-sounding voice OR reliable agentic voice, not both. Yet.^{[8]AI Engineer — when is the "Her" moment?}

Tool calls are the new latency boss

~06:00 Zeghidour: "We're fighting for 10–20ms on the TTS, then a tool call or OpenRouter hop adds 500ms–4s." Gradium's pattern: train the LLM to emit a "filler" utterance while the tool call is running, then weave the result in naturally. Live demo of a "Wanderlust Travel" agent saying nice things about Tokyo while the booking lookup completes.^{[8]AI Engineer — when is the "Her" moment?}

Phonon: on-device TTS for consumer apps

~17:00 Sub-100M-param TTS that runs on a phone CPU (not a gamer GPU). Pitch: hyperscaler voice APIs are run at a loss as a marketing play, and consumer voice apps burn through fundraising on TTS bills before user growth kicks in. Phonon lets you ship voice without metering. Private beta now.^{[8]AI Engineer — when is the "Her" moment?}

I think it's completely false that voice is a commodity. The last mile is going to be the most difficult to solve. — Neil Zeghidour, Gradium

Tools: Gradium Phonon, Moshi, OpenAI Advanced Voice, Cerebras voice, Kokoro, OpenRouter

AI Tools Developer Tools

AI Engineer

ElevenLabs Voice Engine: Wrap Any Chat Agent in Three Lines

Luke Harries, ElevenLabs' Head of Growth, previews Voice Engine — coming in a few weeks — which wraps any existing chat agent in voice without making you rebuild on a new platform.^{[9]AI Engineer — Give Your Chat Agent a Voice — Luke Harries, ElevenLabs} The pitch: most teams already built their chat agent + RAG + tool calling + evals — Voice Engine wraps that with Scribe (speech-to-text) + V3 (text-to-speech) + emotion-aware turn-taking, with about three lines of server SDK glue.

The 2025 chat-agent inflection

~00:30 Harries opens with the Linear/PostHog/SEO chat-as-homepage trend ("you either died a SaaS or became AI-first by adding a chat agent") and the gov.uk chat-first redesign. His claim for 2026: chat agents either get a voice or they die.^{[9]AI Engineer — Give Your Chat Agent a Voice}

The wrap pattern

~03:00 Server SDK: create a client, create a voice engine, attach it to your existing chat agent. Each new session proxies audio in/out while your existing tool calls and RAG run server-side. Tool calls largely stay on your side — Voice Engine adds optional client-side tools for DOM manipulation. Ships with a skill so you can ask a coding agent to wrap your repo in one prompt.^{[9]AI Engineer — Give Your Chat Agent a Voice}

Why it's "omni-channel"

~01:30 Once your agent has voice, it can join a Zoom call (PostHog example: correct your stats mid-meeting), pick up a phone line for support, or sit in a Shadcn-styled widget on your site. Telephony and CSAT come "out of the box" once the wrap is in place.^{[9]AI Engineer — Give Your Chat Agent a Voice}

Tools: ElevenLabs Voice Engine, Scribe, V3, Shadcn, Vercel, Revolut customer support

Podcast

Dwarkesh Patel

Dwarkesh Interviews David Reich: Why Humans Stopped Evolving Smarter

Dwarkesh's latest is Harvard ancient-DNA geneticist David Reich on the punchline finding from his lab: genetic variants associated with cognitive-test performance show a 2-standard-deviation selection signal between 4,000 and 2,000 years ago — and basically zero selection since.^{[10]Dwarkesh Patel — Why Humans Stopped Evolving Smarter 2,000 Years Ago - David Reich} Broader interview covers Neanderthal/Denisovan gene flow, Yamnaya replacement of European farmers ("90% of them are gone"), and Yersinia pestis showing up in 5–10% of ancient DNA samples 4–5K years ago — likely a hidden lever in population turnover.

Areas of interest

Selection on intelligence stopped 2K years ago. Reich's preview clip: signal "maxes out in the Bronze Age between 5,000 and 2,000 years ago, and the impact in the last 2,000 years is almost nothing." Bias-going-in expectation was that industrialization would increase selection pressure; data says no.^{[10]Dwarkesh — David Reich preview clip}
Repeated demographic replacement. Steppe pastoralists replaced ~90% of European farmer ancestry. Reich's framing: "Human history has been again and again a story of one group figuring 'something' out, and then basically wiping everyone else out."^{[11]Dwarkesh — David Reich episode page}
Disease as hidden lever. Yersinia pestis (bubonic plague) DNA shows up in 5–10% of ancient samples 4,000–5,000 years ago — likely involved in population disruptions that enabled the replacements.^{[11]Dwarkesh — David Reich episode page}
The 60K year ago expansion. "Something happened 60,000 years ago" enabling African populations to expand globally — Reich's bet is cultural information storage and social learning, not cognitive differences.^{[11]Dwarkesh — David Reich episode page}
What's still missing. "We need old DNA from 50,000 years ago — from all over Africa" to understand human evolutionary braiding.^{[11]Dwarkesh — David Reich episode page}

It's very tempting to think that something innate makes it possible for these African lineages to spread into Eurasia. It's just complicated. — David Reich

Podcast Industry

Acquired

Acquired: Why Ferrari Always Delivers One Car Less

Acquired's latest episode is on Ferrari — the paradox of a company that's sold roughly 330,000 cars across 79 years at an average ~$500K each, while also running a Formula 1 team beloved by 400M fans.^{[12]Acquired — Ferrari will always deliver one car less than the market demand} The clip pulls out the famous Enzo Ferrari operating principle that's still the strategy in 2026: "Ferrari will always deliver one car less than the market demand."

The hyperscarcity case

Hermès sells comparable unit volumes to Ferrari in two years. Rolex does it in three months. Ferrari's lifetime production is the constraint that prices its used cars at hundreds of millions and lets the company operate increasingly like a heritage luxury brand rather than an automaker.^{[13]Acquired — Ferrari episode page}

The 166 footnote

Enzo's first car was the 166 — fewer than 100 produced. The Acquired clip's nuance: "There was demand for a lot more than 101, but they could only make 100. The fact that it became a business strategy was retconned later." Hindsight legend, but real strategy now.^{[12]Acquired — Ferrari clip}

Companies: Ferrari, Hermès, Rolex, Formula 1 (Scuderia Ferrari)

AI Models

Anthropic Research

Anthropic Research: Teaching Claude Why (0% Blackmail Rate)

Anthropic's May 8 research post on agentic misalignment: training Claude to explain why certain behaviors are better — not just demonstrate them — turned out to be 28× more sample-efficient than direct honeypot-matching, and it generalized.^{[14]Anthropic Research — Teaching Claude why} Every Claude model from Haiku 4.5 onward now scores 0% on the blackmail eval (down from up to 96% in earlier versions). Constitutional-document training plus aligned fictional narratives cut blackmail from 65% to 19%, and the gains held through later RL.

The "difficult advice" dataset

The training data isn't more blackmail honeypots — it's ethical-dilemma user queries paired with constitution-aligned model responses. Out-of-distribution by design. Result: Claude learns the principles, not the test.^{[14]Anthropic Research — Teaching Claude why}

Why this matters now

Sits in the broader Anthropic May 7–8 research drop alongside Natural Language Autoencoders (covered in May 7's briefing) and the Institute agenda. The narrative throughline: alignment is becoming about teaching values that transfer, not patching specific failure modes.^{[14]Anthropic Research — Teaching Claude why}

Models: Claude Haiku 4.5, Claude Sonnet, Claude Opus 4.7

Developer Tools Hot Take

Simon Willison Simon Willison

Simon Willison: HTML Beats Markdown for Claude Output

Simon Willison's May 8 post — still being widely shared on May 9 — argues we should stop asking Claude (and friends) for Markdown by default and start asking for HTML.^{[15]Simon Willison — Using Claude Code: The Unreasonable Effectiveness of HTML} The reasoning: Markdown defaults date from the GPT-4 era's 8,192-token output ceiling. With modern long-output models, HTML unlocks SVG diagrams, in-page nav, callout boxes, comparison tables, and interactive widgets — and the page reads dramatically better.

The copy.fail example

Willison's test case: feed Claude the obfuscated Python from the copy.fail Linux supply-chain exploit and ask for "HTML, neatly styled, with rich and interactive explanation." The result has safety callouts, a comparison table of suspicious patterns, and structured sections — vastly more useful than the same content as a markdown wall.^{[15]Simon Willison — Unreasonable Effectiveness of HTML}

Plus: Luke Curley on WebRTC's dropped packets

Willison's May 9 quote-post highlights OpenAI engineer Luke Curley (ex-Discord) on why WebRTC is the wrong protocol for AI conversation: it's hard-coded to prefer low latency over recoverability, so packets get dropped during congestion with no retransmission inside browsers. Useful framing for anyone shipping voice agents.^{[16]Simon Willison — Quoting Luke Curley}

WebRTC is designed to degrade and drop my prompt during poor network conditions. The implementation is hard-coded for real-time latency or else. — Luke Curley

Industry Hot Take

Caleb Writes Code

Caleb's Read on the Anthropic–SpaceX Frenemy Deal

Caleb adds a useful frame to the Anthropic–SpaceX Colossus 1 deal we covered in May 7's briefing: the 5-hour rate-limit doubling for Pro/Max users is the headline, but the weekly limit is unchanged.^{[17]Caleb Writes Code — Anthropic rents SpaceX?} Translation: this is burst-capacity stabilization for peak hours, not a "you have more tokens now" expansion. The deeper point: Anthropic is the least vertically integrated frontier lab — every layer (chips, infra, energy) is rented — which makes per-user retention more strategic than user growth.

The five-layer cake framing

~01:00 Caleb riffs on Jensen's five-layer cake (apps → models → infra → chips → energy): Anthropic is strongest at the model/safety layer, started dabbling in agentic apps in 2025 (Claude Code, Claude Cowork), and rents the bottom three from competitors — Google TPU, AWS Trainium, Nvidia, plus AWS/Azure/Google for hosting. The SpaceX deal adds another tenant relationship; the actual Anthropic-owned data centers don't come online until late 2026.^{[17]Caleb Writes Code — Anthropic rents SpaceX?}

Peak-throttle removal is the real win

~03:00 Removing peak-time throttling for Pro/Max is what users will actually feel. The 220K GPUs at Colossus 1 absorb spillover demand. API users get bumped limits too. Caleb's read: per-user usage is now the strategic focus, not user growth.^{[17]Caleb Writes Code — Anthropic rents SpaceX?}

"Orbital data center" is not just hype

~05:00 Caleb's contrarian take on the SpaceX orbital-compute mention: it's not totally unrealistic if launch costs hit ~$100/kg. He covered the math in a prior video. The skeptical asterisk: "interest" doesn't equal commitment — see Nvidia walking back its supposed $1B OpenAI investment around the time OpenAI started chip-supplier diversification.^{[17]Caleb Writes Code — Anthropic rents SpaceX?}

Companies: Anthropic, SpaceX/xAI, Colossus 1, Google TPU, AWS Trainium, Nvidia, Cursor, OpenClaw

Developer Tools Industry

Better Stack

Supply-Chain Attack: 4 SAP CAP NPM Packages Poisoned

Better Stack short: on April 29, four official SAP CAP NPM packages — combined ~570K weekly downloads — were poisoned with credential-stealing code for two to four hours via a pre-install script.^{[18]Better Stack — One npm install just stole cloud secrets} Payload installed bun, then hunted for NPM tokens, GitHub creds, AWS/Azure/GCP secrets, and browser passwords. Repo signature: `mini-shai-hulud`.

Detection and remediation

Run npm audit and npm ls against the four affected packages. If you see a bad version: delete node_modules and lockfile, reinstall clean, rotate every secret, and enable 2FA + token expiration. Going forward: npm install --ignore-scripts in CI, pin exact versions, monitor dependency changes.^{[18]Better Stack — npm install stole cloud secrets}

Developer Tools AI Models

Github Awesome Github Awesome

Local Inference: Salvatore Sanfilippo's ds4 for DeepSeek V4

Redis creator Salvatore Sanfilippo dropped ds4: a native ultra-optimized local inference engine for DeepSeek V4 Flash on Apple Silicon via Metal.^{[19]Github Awesome — ds4: Redis creator's DeepSeek V4 inference engine} The trick: treat the SSD as a first-class citizen for the KV cache, streaming conversation context to disk instead of consuming unified memory. Restart the server or swap sessions and you resume exactly where you left off without re-prompting thousands of tokens.

And separately: zero-native

Vercel Labs shipped zero-native, a Zig desktop shell that calls C directly so your web frontend gets platform-SDK access with zero glue. Use the native OS WebView for a tiny binary or bundle CEF Chromium for pixel-perfect consistency. Auto-scaffolds Next.js, React, Svelte, or Vite with native build paths for macOS, Linux, and Windows. The pitch: "web tech desktop apps without the Chromium tax."^{[20]Github Awesome — zero-native Zig desktop shell}

Industry Hot Take

Tech Brew Tech Brew

Emotion AI Hits White-Collar Workplaces

Tech Brew's May 8 story tracks emotion-detection AI moving from call centers and drive-thrus into office knowledge work, with minimal regulation and no employee disclosure required in most U.S. states.^{[21]Tech Brew — Workplace surveillance gets an emotional upgrade} The emotion-AI market is on track for $9B by 2030 (triple today's size). The EU banned workplace emotion AI except for medical/safety use; the U.S. has nothing equivalent.

What's deployed today

MorphCast — real-time emotion analysis with Zoom integration for meetings.
Slack Aware — chat sentiment and toxicity monitoring.
MetLife — voice analysis on call center agents.
Burger King's "Patty" — OpenAI-powered headset that rates drive-thru "friendliness."
Several startups analyzing emotional states of job interview candidates.

The article's central tension: emotion detection has limited construct validity (Americans only scowl when angry ~⅓ of the time, concentration registers as anger), but it's being deployed as if it works. The kicker: "Emotion AI in particular adds a whole line item to your job description: convincing a bot you're always cheerful."^{[21]Tech Brew — Emotion AI workplace surveillance}

Joanna Stern's year without Google

Same Tech Brew issue covers WSJ's Joanna Stern reporting her year-long experiment of abandoning Google for ChatGPT, Perplexity, Gemini, and Claude as her sole search interface.^{[22]Tech Brew — Joanna Stern's Great Gen AI Experiment} Her takeaways: AI search became default quickly; the multimodal magic (text + image + audio + video in one query) was the unlock; she got confidently wrong answers on physical-world stuff (the misdiagnosed garage door); she's keeping the new habit, but dropped generative AI for creative work.

The real magic of AI search was the multimodal part — combining audio, images, video, and text in one query. — Joanna Stern

Industry

Morning Brew

Canvas Hacked: ShinyHunters Claim 275M Student Records

Instructure, parent of Canvas (used by ~half of North American colleges and universities), pulled all Canvas sites offline for several hours Thursday during finals after the ShinyHunters group claimed access to data on 275M people across 8,800 universities and K-12 schools globally.^{[23]Morning Brew — Canvas cyberattack shuts down schools' sites} Penn State cancelled exams; Harvard students reportedly saw a ransom message on the login page. Some users got an exam-week extension; ShinyHunters got a May 12 ransom deadline.

Timeline

May 1: Instructure first flagged a cybersecurity incident — student IDs, names, emails, and intra-Canvas messages exposed.
Thursday May 7: ShinyHunters take credit; Canvas down for several hours; some schools cancel Thu/Fri exams.
May 12: Hackers' deadline for school officials to settle or face data leak.

Big-picture: ShinyHunters has previously hit Microsoft, Ticketmaster, and Salesforce. TechCrunch cautions hacker groups sometimes inflate impact numbers for ransom leverage.^{[23]Morning Brew — Canvas cyberattack}

AI Tools Developer Tools Industry

Nate B Jones Arjay McCandless Better Stack Real Python Morning Brew Sherwood Snacks The Batch

Quick Hits: Capability Gap, Personal Projects, JS Temporal, Jobs Report, Seedance

Saturday's smaller stories — none of them standalone topics, but most worth a glance.

The capability dissipation gap

Nate B Jones short: "the valuable thing to figure out is where you sit relative to the exponential curve and the flat curve." His thesis — AI fluency in your domain is a compounding asset; every new model lands on top of practical foundations that took time to build, so the people building the harnesses (see Topic 1) widen the gap, not narrow it.^{[24]Nate B Jones — Frontier vs Comfortable}

Arjay's 15-step personal project playbook

Arjay McCandless walks the resume-project framework: pick problems from minor annoyances, hobbies, or genuine curiosity; one-sentence test; MVP in 2–4 weekends; you must be the ideal user. Stack: React + Postgres + managed auth (Auth0/Supabase/Clerk) + free-tier hosting; build "walking skeleton" first; vertical scaling before horizontal; build-vs-buy biases to buy 99% of the time; "ship the bugs."^{[25]Arjay McCandless — how to plan personal projects that will actually get you hired}

JavaScript Temporal API hits Stage 4

Better Stack covers JavaScript's Temporal API reaching Stage 4 in March 2026, with Chrome/Firefox/Edge already shipping. Replaces 25 years of Date() pain (month-zero indexing, time-zone hacks). New primitives: plainDate, zonedDateTime, Duration (immutable date math), instant (true UTC timestamps), built-in parsing. Polyfill only for old Safari.^{[26]Better Stack — JavaScript Finally Fixed Dates with Temporal API}

"The Zen of GitHub"

Real Python's clip discusses the existence of a "Zen of GitHub" (analogous to Zen of Python/Ruby), highlighting responsive UI, frequent shipping, and rare-and-short-lived outages. "GitHub going down is always an event in the developer community."^{[27]Real Python — The Zen of GitHub}

April jobs report (released Friday May 8)

U.S. employers added 115,000 jobs in April — roughly double estimates — against an unemployment rate of 4.3%. Healthcare led; "information employment" (tech) lost more jobs and is now down 11% from its Nov 2022 peak. Wage growth 3.6% (below 4.2% expected inflation). Largest two-month payroll gain since 2024, but consumer sentiment hit another record low on gas prices. CPI next week will be the bigger Fed signal.^{[28]Morning Brew — April jobs report shows solid gains but some potential red flags}

Whirlpool's "recession-level" warning

Sherwood Snacks (May 8): Whirlpool fell ~12% after warning Iran-war-driven appliance demand was at GFC levels. Q1 US appliance demand down 7.4%; sales down 10% YoY. Same issue: OpenAI–Broadcom's $18B chip deal hit a financing snag — Broadcom asked Microsoft to commit to 40% offtake or OpenAI find alternate buyers before agreeing to absorb more upfront cost.^{[29]Sherwood Snacks — Even Whirlpool's got war woes}

The Batch issue 352 (May 8): four stories

ByteDance's Seedance 2.0 video model shipped on CapCut to hundreds of millions; ranks top on independent leaderboards; faces Hollywood copyright disputes. Nvidia's NVCell and PrefixRL produce circuit designs "20–30% better than human designs" and turn months-long projects into overnight runs (Bill Dally). Gallup: 50% of U.S. workers used AI at work in 2025; 65% report productivity gains. UT Austin + UCLA RL+LoRA method lets robots learn tasks sequentially without catastrophic forgetting — 81.2% on benchmark.^{[30]The Batch Issue 352 — Seedance, Nvidia chip design, AI at work, robot sequential learning}

The Harness Mental Model: Prompts, Skills, Plugins, MCPs, Hooks

The decision tree

The plugin is the right unit of work

Why this is a non-engineer's job

The Claude Design tell

Printing Press: CLI Factory That Beats APIs and MCPs by 35×

Why APIs and MCPs both lose for agents

What's in the library

The factory part

Goal-Mode Coding: Codex /goal, Hermes persist, and Mission Loops

Why dumb for-loops fail and judge-loops work

Activation and best-practice prompt shape

Vincent (OpenClaw) field notes

Goalbody: scaffolding the goal prompt itself

Missions: when hours aren't enough

Archon: YAML DAGs and Git Worktrees for Deterministic Agents

The three changes

How it compares

Symphony Explained: OpenAI's "Build Your Own Harness" Spec

Why Symphony exists

The build-it-yourself install

Workflow hooks for real projects

Verdant Manager: AI CTO With Slack/Telegram and Long-Term Memory

Where Manager actually beats raw Claude Code

The Slack/Telegram angle

Cost controls

Voice AI Day at AI Engineer: Why TTS Now Looks Like an LLM

The standard architecture

Latency math

The voice cloning asterisk

Gradium's Phonon: On-Device TTS, and Why "Her" Is Still Far Off

Half-duplex is the real ceiling

Moshi vs. modern speech-to-speech

Tool calls are the new latency boss

Phonon: on-device TTS for consumer apps

ElevenLabs Voice Engine: Wrap Any Chat Agent in Three Lines

The 2025 chat-agent inflection

The wrap pattern

Why it's "omni-channel"

Dwarkesh Interviews David Reich: Why Humans Stopped Evolving Smarter

Areas of interest

Acquired: Why Ferrari Always Delivers One Car Less

The hyperscarcity case

The 166 footnote

Anthropic Research: Teaching Claude Why (0% Blackmail Rate)

The "difficult advice" dataset

Why this matters now

Simon Willison: HTML Beats Markdown for Claude Output

The copy.fail example

Plus: Luke Curley on WebRTC's dropped packets

Caleb's Read on the Anthropic–SpaceX Frenemy Deal

The five-layer cake framing

Peak-throttle removal is the real win

"Orbital data center" is not just hype

Supply-Chain Attack: 4 SAP CAP NPM Packages Poisoned

Detection and remediation

Local Inference: Salvatore Sanfilippo's ds4 for DeepSeek V4

And separately: zero-native

Emotion AI Hits White-Collar Workplaces

What's deployed today

Joanna Stern's year without Google

Canvas Hacked: ShinyHunters Claim 275M Student Records

Timeline

Quick Hits: Capability Gap, Personal Projects, JS Temporal, Jobs Report, Seedance

The capability dissipation gap

Arjay's 15-step personal project playbook

JavaScript Temporal API hits Stage 4

"The Zen of GitHub"

April jobs report (released Friday May 8)

Whirlpool's "recession-level" warning

The Batch issue 352 (May 8): four stories

Sources