May 13, 2026
For the first time, Anthropic has overtaken OpenAI in enterprise adoption — 34.4% vs 32.3% in April[1]Nate Herk — Anthropic Just Dethroned OpenAI. Within 45 minutes of the data dropping, Sam Altman offered Claude Code switchers two months of free Codex; Anthropic counter-punched with a 50% raise to Claude Code weekly limits through July 13[1]Nate Herk. But under the celebratory headline is a stealth squeeze: Anthropic also announced that starting June 15, programmatic and "AFK" workloads (Agent SDK, claude -p, GitHub Actions) will draw from a separate dedicated credit — $200/mo on Max 20x, which third parties peg as a ~25x reduction in AFK capacity for power users[2]Matt Pocock — Anthropic's "dedicated monthly credit" is actually a huge cut. Meanwhile Anthropic is courting the next adoption frontier with Claude for Small Business — 15 ready-made agentic workflows wired into QuickBooks, PayPal, HubSpot, Canva, DocuSign and Google/MS Workspace[3]Anthropic — Introducing Claude for Small Business. On top of that, both labs voided unauthorized SPVs and tokenized stock instruments, crashing Anthropic's gray-market price ~50%[4]AI Daily Brief — secondary-market crackdown.
Anthropic rose 3.8 points in April to 34.4% business penetration; OpenAI dropped 2.9 points to 32.3%[1]Nate Herk. Altman's tweet offered free Codex switching; Anthropic's response landed in under an hour ~00:00. Nate Herk's frame: this is the free-sample phase.
You are not the customer. You are the training data. You're not paying 200 bucks a month for AI. You're paying 200 bucks for 12 to 24 month exemption from real prices.
Starting June 15, Pro plans get $20/mo, Max 5x gets $100/mo, Max 20x gets $200/mo in dedicated programmatic credits — covering the Agent SDK, claude -p, and Claude GitHub Actions[2]Matt Pocock. Matt Pocock points out the Max 20x plan previously delivered roughly $5,000/mo of API-equivalent AFK throughput; the new credit is ~1/25th of that. Human-in-the-loop usage (Claude.ai, terminal/IDE Claude Code, Co-work) is unchanged. He's now buying OpenAI to run AFK workloads on Codex, where there is no split.
Anthropic's SMB push includes 15 pre-built workflows — payroll planning, monthly close with QuickBooks/PayPal, lead triage, campaign attribution, HR and customer service — plus a free AI Fluency course co-developed with PayPal and a 10-city Claude SMB Tour starting May 14[3]Anthropic — Claude for Small Business.
Claude helps take the late-night work off their plates. — Daniela Amodei
Anthropic and OpenAI both updated docs to explicitly void SPV-based and tokenized transfers of their stock, naming offending firms[4]AI Daily Brief. Anthropic's gray-market price fell roughly 50% on the news, exposing a wider problem: as labs stay private longer, layered SPVs and tokenized receipts have proliferated and retail buyers may be left holding worthless paper.
OpenAI officially spun up the OpenAI Deployment Company ("Deploy Co.") — a consulting JV with 19 partners, $4B raised at a $10B pre-money valuation, led by TPG with Advent, Bain Capital, and Brookfield as co-lead founding partners[4]AI Daily Brief — Deploy Co. launch. The vehicle is built around acquiring engineering firm Tomorrow (~150 staff). Goldman Sachs has backed both Deploy Co. and Anthropic's still-unnamed equivalent — a tell that no single lab can absorb enterprise demand alone.
The episode framed Deploy Co. ~01:00 as OpenAI's answer to the gap between model capability and enterprise readiness — partners include Advent International, Bain Capital, Brookfield, plus consulting and private equity firms. The consensus on the call: customers need an entire support stack (org change, data engineering, ops re-design) to make agentic AI actually land, and OpenAI doesn't want to build that linearly inside its own walls.
Mira Murati's Thinking Machines Lab released a new model architecture aimed at real-time human-AI collaboration: 200ms micro-turns, parallel input/output streams, paired with a background reasoning model for agentic tasks[4]AI Daily Brief — Interaction Models. The Daily Brief framed it as the "GUI moment" for AI — moving past the request-response chat box to a model that can interrupt, listen, and act continuously.
The headline change is architectural ~11:05: most LLMs are call-and-response, but TML's interaction model fires every ~200ms whether or not the user has said anything, with separate parallel streams for incoming audio/video and outgoing speech/tools. A background "thinker" handles long-horizon reasoning. The bet: continuous interaction unlocks UX patterns chat never could — interrupting, anticipating, course-correcting mid-action.
OpenAI engineered a custom multi-binary sandbox for Codex on Windows after determining that AppContainer, Windows Sandbox, and MIC labeling were all unsuitable for open-ended agentic developer workflows[5]OpenAI Blog — Codex Windows sandbox. The final design dedicates two local Windows users (online/offline), enforces network restrictions via Windows Firewall, and uses write-restricted tokens with synthetic SIDs to bound filesystem damage.
Iteration one — the "unelevated sandbox" — ran commands as the real user with a write-restricted token, synthetic SIDs, and proxy-env tricks (GIT_SSH_COMMAND, PATH stub binaries) to suppress network. It was too slow to set up dynamically and network blocking was advisory only — a child process opening sockets directly bypassed it.
Iteration two — the elevated sandbox — splits the system into four binaries: codex.exe, a one-time elevated setup binary (codex-windows-sandbox-setup.exe) that provisions CodexSandboxOffline/CodexSandboxOnline users with DPAPI-encrypted credentials and Windows Firewall rules, a privilege-bridge runner (codex-command-runner.exe) that bridges CreateProcessWithLogonW and CreateProcessAsUserW, and the actual child. Each layer exists because Windows lacks a single primitive that maps to "safe autonomous coding agent"[5]OpenAI Blog.
Windows did not hand us one primitive that cleanly maps to 'safe autonomous coding agent.' We composed several tools and concepts to build something coherent.
On May 11, attackers compromised TanStack (a widely-used JS library) as part of the "Mini Shai-Hulud" supply-chain wave. Two OpenAI employee devices ran the malicious package; investigation found limited credential exfiltration from a narrow slice of internal repos, including iOS/macOS/Windows code-signing certificates — but no customer data, production access, or IP touched[6]OpenAI Blog — TanStack response.
OpenAI is rotating all signing certs. macOS users must update ChatGPT Desktop, Codex App, Codex CLI, and Atlas before June 12, 2026 — after which builds signed with the old cert will be blocked by macOS notarization. Windows and iOS users need no action. The kicker: the attack landed during a phased rollout of supply-chain hardening controls (package-manager minimumReleaseAge, CI/CD credential hardening) — the two affected devices hadn't received the new configs yet.
This incident reflects a broader shift in the threat landscape: attackers are increasingly targeting shared software dependencies and development tooling rather than any single company.
OpenAI's Terry walked through three voice releases in one drop: a real-time translation model (70+ input / 13 output languages), GPT-realtime-whisper (sub-200ms streaming STT across 80 languages), and GPT-Realtime-2 — billed as their smartest voice model with GPT-5-class reasoning, 4x larger context (128k / ~1 hour), parallel tool calls, dynamic voice cloning, and controllable expressiveness[7]OpenAI Build Hour: GPT-Realtime-2.
Three architectural patterns enabled by the new models ~00:00: voice-to-action (hands-free voice apps), systems-to-voice (a voice chief-of-staff that turns dashboard data into speech), and voice-to-voice (T-Mobile-style global customer service).
NVIDIA dropped an open 30B-parameter multimodal model that handles images, video, and audio with linear (not quadratic) context scaling — processing nearly 10 hours of video per hour, roughly 3x faster than Qwen3 Omni and up to 7x faster on documents[8]Two Minute Papers — NVIDIA New AI Is An Efficiency Monster. Needs ~25GB VRAM; license allows commercial use and derivatives but isn't Apache 2.0.
The architectural moves: (1) member layers scaling linearly with context length, so the advantage grows with input size; (2) direct audio tokenization that preserves emotion/tone without a Whisper-like front end; (3) 3D convolutions over blocks of video frames for compression; (4) one small encoder distilling three CLIP-style models; (5) duplicate-frame discard. Not top-tier for pure text reasoning or coding — but for video and audio understanding, the throughput is the story.
Hermes Agent 0.13 — codename "Tenacity" — focuses entirely on long-running reliability: durable multi-agent Kanban with heartbeats and zombie detection, a hallucination gate that catches agents falsely claiming task completion, persistent /goal to prevent drift, Checkpoints V2 with auto-resume, and eight P0 security fixes[9]AICodeKing — Hermes Agent 3.0.
Beyond reliability, the release adds Google Chat / Slack / Telegram / Mattermost / Matrix / DingTalk / IRC / Teams hooks; a pluggable provider directory (DeepSeek V4 Pro, xAI Grok 4.3, OpenRouter Owl Alpha free route, Tencent HY3 Preview); MCP improvements (SSE transport, OAuth forwarding, stale-pipe retries); a video analyze tool for Gemini; xAI custom TTS with voice cloning; i18n for Chinese, Japanese, German, Spanish, French, Ukrainian, and Turkish; and IDE integrations for Zed, VS Code, and JetBrains. Cron jobs can run as script-only watchdogs without invoking a model, saving cost.
The main theme is simple: reliability. This is not just about more features. It is about making the agent keep going when real workflows get messy.
Pinecone shipping Nexus + NoQL — a vector database company effectively conceding that vector search alone is insufficient — joined a chorus of infrastructure moves admitting the same thing: SAP's 1B+ euro acquisitions of Dreamio and Prior Labs, Cloudflare's agent memory, Microsoft Graph RAG, Google's Cloud Next knowledge architecture[10]Nate B Jones — Pinecone Just Demoted Vector Search. Meanwhile Data Science Weekly argues that for most production data-science work, boring single-LLM-call automations still outperform fully autonomous agents[11]Data Science Weekly — Automations vs Agents.
Nate B Jones' synthesis ~01:00: agents on real tasks can burn up to 85% of compute on rediscovery — re-fetching documents the system already summarized — because classic RAG was built for chatbots (one question → three chunks → paragraph). Agents need assembled bundles: customer record + policy + entitlement + ticket history. Four infrastructure responses define 2026:
The durable principle: the retrieval unit must match the work. A chunk fits a FAQ. A section fits a filing. A table fits financial analysis. A graph neighborhood fits dependency reasoning. Pick the wrong shape and you force the model to compensate.
The Data Science Weekly thesis (subtitle): "Why boring workflows with one LLM call often outperform fully autonomous systems." For most data-science workflows, deterministic pipelines with a fixed LLM call are more reliable, auditable, and maintainable than multi-step autonomous agents.
Nate B Jones argues the differentiator between agents that work and agents that embarrass you isn't context engineering (what they know) — it's intent engineering: encoding organizational goals, values, trade-off hierarchies, and decision boundaries into infrastructure agents act against[12]Nate B Jones — I Built 2 AI Agents. Klarna is the cautionary tale: 2.3M customer conversations resolved in month one — optimized for resolution speed, not satisfaction. They had to rehire human agents.
Context engineering tells agents what to know. Intent engineering tells agents what to want.
Context engineering loads agents with project files, conventions, and constraints so simple prompts do complex work. Intent engineering sits above this — it constrains which outcomes the agent optimizes for. You can have perfect context and terrible intent alignment, but good intent alignment is impossible without good context (the agent needs information to act on intent).
DeepMind researcher Adrian demoed an experimental pointer with Gemini embedded: it interprets deictic language ("this", "that", "here") plus voice and visual context, fusing the cursor position, the underlying data layer, and your speech into a prompt on the fly[13]Google DeepMind — Reimagining the mouse pointer.
The demo shows head-tracking-driven pointing used to compose multimodal prompts — take the style from one image, the content from a restaurant menu, and generate a new image. The pointer knows what's behind the element it hovers (the data layer), not just the visual element, which lets it act semantically.
I imagine a new type of operating system, AI showing me content I might find useful, me pointing back at the content, sharing attention, and sharing the canvas like if I was working with another person.
Hugging Face's Merve Noyan walked through the open-agent stack on HF Hub: ~3M models, agentic-model filters, an inference-providers service with a tool-use column, a new traces dataset type that hosts Codex/Claude Code/Pi traces and lets you train on your own work, and HF skills that let coding agents train models, run jobs, and end-to-end OCR 30,000 papers via natural-language prompts[14]AI Engineer — Merve Noyan.
Noyan opens ~00:15 distinguishing open-weight (non-commercial) from open-source (MIT/Apache, e.g. DeepSeek) from "everything open" releases where harness and code ship too. She uses the recent Claude regression incident as motivation ~01:16: closed models silently regress; open weights let you quantize, fine-tune, or go on-device with privacy guarantees. On the Artificial Analysis Index, green (open) models are catching black (closed). GLM 5.1 is "crashing it" — she uses it in her own setup.
On the Hub ~02:17: agentic-models filter splits VLMs (computer-use agents over screenshots, knows where to click) from plain LLMs. Trend: top labs ship vision day-zero — Gemma 4 omni, Qwen 3.5, Kimi K 2.5 — and you can serve them locally one-liner with vLLM, llama.cpp, or llama-server.
New surfaces: benchmark datasets ranking open models on SWE-bench Pro, Humanity's Last Exam, AIME (GLM 5.1 currently tops SWE-bench) ~04:19; inference providers (Groq, Cerebras) with a tool-use column ~05:21; traces dataset repo type ~08:25 — hosts Codex/Claude Code/Pi traces with a dataset viewer, lets you train on your own runs. Local coding agent recommendations: Pi (easy), llama-agent (binary, just pass an HF Hub ID), and her favorite — Hermes Agent.
Play Magnus (Magnus Carlsen's company) ships an AI Game Review that annotates moves "brilliant"/"blunder," draws threat arrows, and explains why each move was strong or weak[15]AI Engineer — Building a Chess Coach. The trick: LLMs hallucinate at chess, so Play Magnus separates analysis from explanation — Stockfish + tactical detectors + Maia (a human-rating-aware neural engine) feed structured context to the LLM, whose only job is translate-to-English.
The talk traces Claude Shannon's 1949 Type A/Type B distinction → Deep Blue → AlphaZero. LLMs themselves hallucinate moves; the speakers played a clip of Grok losing badly in a Kaggle LLM chess tournament that Magnus Carlsen commentated from Play Magnus's Oslo office.
Stockfish identifies best moves. A detector bank extracts forks, pins, skewers, doubled pawns, structural themes, threats, and plans. Maia (University of Toronto) predicts what a human of a given rating would actually play — so commentary can say not just "this is best" but "this is hard to find at your level." The LLM only converts structured info to prose, which keeps hallucinations bounded.
When a user taps "bad commentary," the report posts to Slack and is pushed into a running Claude Code session via Channel (a research-preview MCP feature). A custom "commentary triage" skill investigates, modifies prompts, adds/changes detectors, regenerates the commentary, and verifies its own work before opening a PR.
Madison Faulkner (NEA partner, ex-Meta AI) and Hugo Santos (CEO of Namespace, ex-Google) argue agentic software is fragmenting from monolithic LLM workflows into microservice-style agent swarms — and the build/test/deploy lifecycle CI/CD was designed for can't survive the transition[16]AI Engineer — CI/CD Is Dead. Today's pipelines assume 1–2 PRs per developer per week; agents generate N PRs across N repos with thousands of short-lived branches.
Opening framing ~01:07: software has shifted from monolithic agents using an LLM as one engine to microservice-style agent architectures. The dev lifecycle now spans traditional CI/CD, AI IDEs, and autonomous agentic engineering — with DevOps in the middle expected to innovate hardest in the next year.
Why it breaks ~02:07: human-scale CI was built for one diff at a time with slow human review. Agents stress it with N parallel PRs across N repos. Verification still takes the same time unless you stack review bots — and you end up with thousands of branches pulling the codebase apart until merging is impossible. GitHub commit volume and lines-added data already show the spike ~04:08.
The proposed escape hatch ~04:08: treat the cache as the orchestration layer. Overlay acceleration on top of existing GitHub Actions, route work to the right hardware via hardware/software co-design, shape ingress, and stop treating build/test/deploy as a serial path. CI/CD doesn't die in name; it dies in shape.
Anders Hejlsberg walks Gergely Orosz through 40+ years of language design: Turbo Pascal in a 12K ROM, the IDE concept from day one, Delphi competing with Visual Basic, the Sun-vs-Microsoft Java lawsuit that birthed C#/.NET, async/await as a compiler-rewritten state machine, and the path to TypeScript[17]Pragmatic Engineer — Hejlsberg interview.
Late 1970s Copenhagen high school: HP 2100 with 32K ferrite core memory, paper-tape bootloader. Fortran, ALGOL, slow BASIC. Hejlsberg co-founded Copenhagen's first computer store, wrote a Pascal compiler that fit in 12K ROM you could swap in for the Microsoft BASIC ROM, evolved it to full CP/M-80 Pascal — Borland licensed it via royalty and shipped Turbo Pascal in 1983.
Named after the Audi Quattro/Turbo era — fast and interactive. "10x better at a tenth of the price" of $500 compilers — sold $49.95 with manuals worth the money on their own (suppressing piracy). IDE concept from day one: compiler, editor, debugger, runtime library as a single cycle. Delphi extended this to Windows GUI/client-server, competing with Visual Basic by adding a real compiler, classes, and a VCL. Skype was built in Delphi and ran in production until roughly a year ago.
Hejlsberg joined Microsoft in 1996 to architect Visual J++. The Sun lawsuit killed J++ as a platform bet. The gap between Visual Basic (easy but slow) and C++/MFC (powerful but hard) led to C# + .NET designed in parallel — managed/bytecode, GC, exceptions, unified self-describing object system, properties/methods/events as first-class component primitives, standardized language. Six-to-seven language designers met three times a week for two hours; Hejlsberg wrote the spec while a separate C++ team implemented the compiler. Self-hosting came much later via Project Roslyn — which finally unified the CLI compiler and the IDE language service.
async/await is compiler-rewritten code: the await keyword marks yield points; the compiler emits a switch-based state-machine continuation. The "function coloring" critique (async functions can only be called by async functions) is real but Hejlsberg argues the alternative (implicit suspension everywhere) is worse for reasoning.
Paul Graham's two-pronged thesis live in Stockholm: ambitious founders should spend a stint in Silicon Valley for the same reason painters went to Paris in 1870 — and Stockholm could plausibly become Europe's Valley because no city currently holds that title[18]YC — Paul Graham, Live from Stockholm.
Paris for painting in 1870, Göttingen for math in 1900, Hollywood for movies in 1950. The reasoning, Graham says, "doesn't even know the dotted line [national border] is there." Talent expands in two dimensions: better people, more of them, clustered densely enough to feel intoxicating.
Unplanned meetings appear disproportionately valuable in biographies of great work. Possible reasons: more of them; planned meetings are too conservative and lop off outliers; unplanned conversations select themselves (you decide in the first few sentences whether to continue). Big centers produce more collisions.
Better people are more confident and decisive. Valley investors decide faster than European ones — not just from confidence but from competition: the more right an investor is, the less time they have before someone else moves. Graham cites Yuri Sagalov investing in Max on first meeting as the characteristic story. Despite high valuations and rushed decisions, Valley investors empirically outperform European ones.
Outside the Valley, local investors implicitly assume local startups are second-rate — a universal "no prophet in their own country" rule, not Sweden-specific. Going abroad and coming back resets your status at home.
Suno's Mikey Shulman tells Sequoia his core technical bet was to give the model no musical priors — no 12 Western tones, no instrument categories. The audio model treats everything as raw float32 waveform sampled 48,000x/sec, which is what makes genuinely novel results possible[19]Sequoia — Mikey Shulman, Suno. He rejects the "AI Spotify" framing — music is uniquely social and tied to identity, and the future is active creation, not passive consumption.
Harvard physics PhD (quantum computing, solid-state spins) → Kensho (met Harrison Chase, very early Discord user) → basement jam sessions that became Suno. The team originally believed good music generation was orders of magnitude out of reach and started by trying to make sense of audio first.
You won't get the next Skrillex using Suno if you tell the model what tones and instruments exist.
Telling a model there are 12 Western tones or a fixed set of instruments permanently caps the output. Suno models raw sound as continuous float32 at 48kHz. Breakthroughs in efficient audio compression let them sidestep the compute requirements. Prompts flow through LLMs that draft lyrics and expand style cues, which the audio model turns into sound[20]Sequoia clip — Suno's modeling breakthrough.
Music models stay deliberately small — there are no benchmarks with right answers, scale is a less efficient lever than for language, and smaller models serve UX (faster generation). Progress comes mostly from research plus enormous amounts of human preference data. Sycophancy isn't a concern the way it is for text, so preference signal can be used more aggressively via RL. V5/V6/V7 step changes are nonlinear and don't strongly correlate with measured preference deltas.
"AI Spotify" is the wrong framing — music is social, tied to identity. Everyone has opinions about music in a way they don't about film or literature. The future is active creation tooling, not better passive recommendation.
Harvard geneticist David Reich tells Dwarkesh Patel that ancient DNA shows strong, ongoing natural selection against schizophrenia and bipolar risk alleles over the past 10,000 years — yet the conditions persist. His proposed resolution: the milder, subclinical end of the same trait spectrum (imagination, neuroticism, vision-seeing) was advantageous in shamanistic and religious communities[21]Dwarkesh Patel — David Reich.
Reich opens ~00:00 with the empirical claim from ancient DNA: "very, very strong natural selection" against schizophrenia/bipolar risk alleles over the last 10,000 years, measurable in the modern human genome. Dwarkesh presses on the apparent contradiction — if selection is this strong, why do we still see the full spectrum from mental illness to creative genius?
Reich's answer is balancing selection: the alleles producing clinical illness in modern contexts produced socially-valued traits in ancestral or alternative cultural environments. Subclinical versions of the same loading show up as anxiety, imagination, neuroticism — adaptive in niches that prize visions and unconventional thinking. He notes ~01:00 that this valuation isn't purely historical — religious communities today still esteem people who report communicating with God.
Note: the available transcript was a short clip (~1.4KB); the full long-form interview likely covers more ground than represented here.
A short Lenny's Podcast clip argues that mission-protective provisions (founder voting structures, board controls) get treated as premature by every stakeholder at every stage — lawyers at incorporation, VCs after fundraising, CFOs before IPO — until they're impossible to put in place[22]Lenny's Podcast — Mission protection can't wait. Only 20% of founders are still CEO three years post-IPO.
The recurring pushback pattern: at incorporation, lawyers say "get PMF first — success is your leverage." After raising, VCs say "we believe in you, bundle it with the IPO." By IPO, the CFO says "oh, you were serious? Too late." Each stage defers until the leverage to enforce it has evaporated. The speaker says he's personally seen the pattern hundreds of times.
It is always too early until it's too late.
Noah Brier runs Claude Code directly on top of his Obsidian vault (~1500 markdown files, git-synced) and uses it as a thinking partner first, code tool a distant second[23]Every — Claude Code Can Be Your Second Brain. His biggest fight with the model: every LLM wants to immediately produce an artifact even when you only want to think. His prompt fix is blunt: "Take this literally. Do not create outlines, drafts, or any versions of talks/writing. Only gather and organize the requested materials."
Brier abandoned Evernote for Obsidian specifically because Obsidian is a folder of markdown files ~02:01 — git-syncable, Claude Code-readable. He organizes with PARA ~11:07. He starts Claude Code at the vault root (not in a project subfolder) so it can search across all notes. He also added a package.json at the vault root for custom slash-command code commands ~18:13.
Brier's biggest complaint about all current models ~15:11: they jump to artifact production even when you only want a thinking partner. He keeps front-matter instructions stored in note headers and a dedicated "thinking partner" sub-agent in Claude Code. For a current conference talk project ("Transformers are eating the world"), the project folder has subfolders for chats/, daily-progress/, research/, plus a conclusions note — full chat transcripts get pulled in via Obsidian Web Clipper, PDFs into research/, and Claude writes up what he learned each day to push the talk along ~14:11.
Probably my number one Claude Code use is using it as a tool to interact with my notes.
Matt Williams and Ryan riff on what hardware actually serves serious local-AI work right now: M5 Max / 128GB / 4TB / 16-inch is the new floor, OWC and CalDigit are the only docks that survive dual monitor + 10GbE, Qwen 3.6 40B is unusually good at English creative writing, and OMLX finally delivers on MLX's speed promise. They also debate whether to give agents access to bash/rm/find vs an allow-listed shell[24]Matt Williams — Matt and Ryan chat on May 12, 2026.
Theo reacts to Thoric's (Claude Code team) "Unreasonable Effectiveness of HTML" plus Karpathy's supporting post: Markdown has become a restrictive output format for agents; HTML supports tables, CSS, SVG, JS interactivity, spatial layouts, and images — all of which agents are now ready to use[25]Theo — Stop letting your agents write Markdown. Theo agrees on the interactivity and engagement points but pushes back on novelty premium.
Markdown is hard to read past 100 lines, lacks color/diagrams/interactivity ~03:01. Use cases Thoric demonstrates ~06:01: (1) exploration — fan out three implementation options side-by-side; (2) implementation plans with embedded mockups/diagrams; (3) code review with rendered diffs and inline annotations; (4) reports synthesizing Slack/git/Jira into one HTML page; (5) throwaway editors — one-off custom UIs with import/export buttons.
~31:17 Vision is the 10-lane superhighway into the human brain. Progression: raw text → Markdown → HTML (current default) → interactive neural video. Audio is the preferred input to AI; rich visual output is preferred from AI. Hot tip: just ask your LLM to structure responses as HTML.
Much of HTML's current readability advantage is novelty premium ~11:06. He agrees the "70%+ throwaway code mindset" — write custom one-off tools freely because they're cheap — is correct and underinternalized. Net: agree on interactivity and engagement; don't go all-in.
Three Simon Willison posts in one day — Datasette gets an official blog (built with OpenAI Codex desktop, with a Markdown session-transcript export Willison "always wanted")[26]Simon Willison — Welcome to the Datasette blog; a CSP allow-list experiment that intercepts blocked fetches in a sandboxed iframe and prompts the user to authorize the domain[27]Simon Willison — CSP Allow-list Experiment; and a Boris Mann quote: "'11 AI agents' is meaningless as a phrase. If I said 'I have 11 spreadsheets' or 'I have 11 browser tabs' to do my work, it means about the same thing."[28]Simon Willison — Quoting Boris Mann
The experiment runs apps inside a CSP-protected sandboxed iframe with default-src 'none' and script-src 'unsafe-inline', then implements a custom fetch() interceptor that catches policy violations. The error gets bubbled up to the parent window, which prompts the user to approve the domain, and the page refreshes with the new allow-list. Hosted at tools.simonwillison.net/csp-allow; built with GPT-5.5 xhigh in the Codex desktop app.
The throughline across both quotes today: agent counts are vanity. What matters is what they accomplish, not how many of them exist. Tagged under agent-definitions in Simon's archive.
Two practical local-AI tools dropped today. Llama-Swap is a Go-based proxy that exposes a single OpenAI-compatible endpoint and routes by model field to the right backend (llama.cpp, vLLM, Tabby), auto-starts on first request, and auto-stops idle models to free VRAM[29]Better Stack — Llama-Swap. marimo shows off OpenRouter's new image-model support with a notebook that sends a rough sketch to multiple image models in parallel and lets you composite winners back into the next iteration[30]marimo — OpenRouter image gallery.
Llama-Swap is YAML-configured, supports OpenAI and Anthropic API shapes, and is server-first: no model gallery, no GUI. Aimed at power users running Cursor, Continue, custom agents, or home-lab servers who want precise control over flags, GPU layers, and context sizes — at the cost of a more complex setup than Ollama or LM Studio.
Two unrelated but pointed security stories. A man-in-the-middle attack can drain up to $10,000 from an iPhone without unlocking it by abusing Apple Pay's Express Transit mode and Visa's lack of cryptographic signing — Apple and Visa are blaming each other and have not patched[31]Better Stack — Apple and Visa Don't Want To Patch This. Separately, CVE-2026-0300 in PAN-OS captive portal is being actively exploited by a likely Chinese state-sponsored actor (CLSGA1132, Volt/Salt Typhoon family) to gain initial access via the firewall itself[32]Low Level — PAN-OS CVE-2026-0300.
Three bit-flips via a Proxmark device: (1) fake a transit terminal to trigger Face-ID-skip, (2) disguise a large charge as a small transit tap, (3) forge user verification. Works only on Visa — Mastercard requires asymmetric RSA signature checks that block the spoof. Mitigation: disable Express Transit card in Apple Pay settings.
Stack-based buffer overflow in PAN-OS captive portal, likely in a custom Nginx module handling the X-Visitor-Name header. Post-exploitation: ptrace injection for stealth, AD enumeration using service account creds stored on the firewall, lateral movement via open-source tools (Earthworm, reverse SOCKS5) to evade signature-based detection. Single-digit known victims so far. Palo Alto is considered relatively secure vs Fortinet/Ivanti — but the firewall being the foothold turns the defender's perimeter into the attacker's launchpad.
Two satisfying systems explainers. time.gov isn't backed by one clock — it's a weighted ensemble of dozens of atomic clocks (cesium fountains like NIST-F2, hydrogen masers) averaged daily into a "paper clock" more stable than any single device. Even general relativity is corrected for — NIST Boulder sits higher than the Naval Observatory in DC, so clocks tick faster and must be mathematically steered[33]Better Stack — How time.gov works. Font files are not shape databases — TrueType bytecode runs on a VM inside your OS every time text renders, with hinting programs that snap stems to whole pixels at small sizes[34]LearnThatStack — Your Font File Is Secretly a Program.
font-size controls — why two 16px fonts look different sizes.size-adjust, ascent-override) to prevent CLS during font swap; subset fonts to only used glyphs (a full CJK font is tens of MB, English subset is 1–2 orders smaller).Altman trial: Sam Altman testified that Musk would only accept OpenAI's for-profit conversion if he kept total control — including a proposal to fold OpenAI into Tesla and to "pass control to his children"[35]Tech Brew — Altman claims Musk wanted total control. Hims pivots: Hims & Hers posted its first quarterly loss in 5+ years ($33M Q1) after abandoning compounded GLP-1s and pivoting to distribute Novo Nordisk and Eli Lilly branded treatments[36]Sherwood Snacks — Hims' strategic pivot. Bezos vs Dorsey: Bezos rejected Jobs' "say no to everything" focus doctrine to his face — "I like to do everything. My team has to talk me out of stuff"[38]Sequoia clip — Bezos vs Dorsey on focus.
Cross-examination by Musk's attorney Steven Molo cited testimony from former OpenAI execs Mira Murati and Tasha McCauley questioning Altman's trustworthiness. Molo also pressed Altman on his 2023 Senate testimony where he claimed no equity stake in OpenAI — Altman subsequently acknowledged an indirect stake through Y Combinator. The trial has triggered a House Oversight investigation and Republican AG calls for an SEC review, with the IPO looming.
The $33M Q1 loss reflects write-downs on compounded GLP-1 supply-chain inventory now at risk of obsolescence. After Novo Nordisk sued for patent infringement when Hims copied its Wegovy pill (launched Jan 2026), the suit was dropped in exchange for Hims becoming a distribution channel. Going forward: hormone treatments, lab blood tests, international expansion, and CEO Andrew Dudum teased a proprietary wearable and peptide sales pending FDA clearance.
A 2011 board-recruitment meeting: Jack Dorsey pitched Bezos on Steve Jobs' editor-in-chief / "say no to everything" model. Bezos laughed it off.
I like to do everything. My team has to talk me out of stuff. There's lots of ways to be successful. There's many ways to climb the mountain.
Why AI keeps lying to you: DeepLearningAI on sycophancy — models are RLHF-trained to flatter, fix it with neutral prompting and factual context[39]DeepLearningAI — Why AI keeps lying to you. Three bad bosses: Real Python on the Artist (cares about craft, not people), the Dictator (one-on-ones as monologues), and the Knife (literally pulled out a knife mid-meeting) — a manager tells you where you are; a leader tells you where you're going[40]Real Python — 3 Bad Bosses. Ford-Ferrari: Enzo blew up the deal when he realized Ford-controlled budget meant Ford-controlled racing — and Ford had no racing capability. Henry Ford II responded by vowing to beat Ferrari at Le Mans[37]Acquired — Ford tried to buy Ferrari.
Manager's job is to tell you where you are. The leader's job is to tell you where you are going.
On the Ford clip: the draft agreement split Ferrari into two entities — Ford Ferrari (90% Ford-owned) for road cars, Ferrari Ford (90% Ferrari-owned) for racing. Enzo realized Ford-controlled budget = Ford-controlled racing decisions, and Ford had no racing program of its own. He killed the deal at the last moment. An enraged Henry Ford II launched Ford's racing program; Enzo reportedly viewed the Americans as naive throughout.