May 12, 2026
The FT reports Anthropic is weighing up to $50B at a $900B pre-money valuation ahead of a fall IPO — which would leapfrog OpenAI's $852B March mark.[1]The AI Daily Brief A Solana-traded pre-IPO instrument backed 1:1 by SPV exposure was implying a $1.2T valuation last week. Cerebras is upsizing its IPO with orders running 20× supply. The structural footing for Anthropic's IPO posture — a PBC plus a Long-Term Benefit Trust where outside safety experts (no equity) appoint board seats — is what Lenny's guest credits for letting Anthropic walk away from a $200M Pentagon contract.[2]Lenny's Podcast
~00:30 Anthropic last raised in February at $380B; revenue has "gone parabolic" since. Sources told the FT a final pre-IPO round could clear up to $50B at $900B pre-money, which would put Anthropic instantly above OpenAI's $852B March mark and into the top 20 public companies if it priced the IPO at the same level. One investor said "people are ready to throw any dollar amount at Anthropic"; another said the SpaceX compute deal "dramatically derisked the investment."[1]The AI Daily Brief
Anthropic has resolved the biggest bottleneck and potential source of weakness, which is compute.
~02:30 Reuters reports Cerebras may boost its IPO price from $115–125 to $150–160, lifting valuation from ~$26B to over $34B, with the offering upsized from 28M to 30M shares. Demand was 3× supply last Tuesday and is now reportedly 20×. Polymarket projects a $50B+ market cap on day one. Beth Kindig was characteristically blunt: "I wouldn't touch Cerebras with a 100T pole. Sure, it may go up on hype sentiment in the short term, but it's far too fundamentally risky because of scaling and future execution."[1]The AI Daily Brief
~00:00 Anthropic was set up as a Public Benefit Corporation with a Long-Term Benefit Trust before ChatGPT existed. The Trust is made of outside AI safety experts who appoint and hold accountable specific directors on the for-profit board — and crucially, they hold no equity, so financial pressure doesn't bend them. Lenny's guest argues this is what enabled Anthropic to turn down a $200M Pentagon contract: "investors can't just oust Dario on a moment's notice."[2]Lenny's Podcast
They turned down a $200 million contract and bore the wrath of the world's largest army and government. That took a lot of courage.
OpenAI shipped Codex computer use on Mac with a separate cursor, so it can drive Spotify, UTM, Reminders, and iMessage while you keep working on your own machine.[3]OpenAI The trick is leaning on Apple's accessibility framework to read off-screen UI structure, which means Codex can use the new non-multimodal Codex Spark model and run "faster than you can." Same day, OpenAI launched a Codex Chrome plugin giving Codex direct browser context in its own tab group.[1]The AI Daily Brief
~03:02 The differentiator from every prior computer-use implementation, per the OpenAI demo: Codex gets its own cursor. The live demo had it simultaneously spinning up a macOS VM in UTM (including downloading macOS), starting playback in Spotify, and adding a reminder — while the presenter kept typing.
A lot of computer use implementations, in fact every computer use implementation I've ever seen, takes over your entire computer. So you can't use your computer while the agent is using your apps.
~06:02 The technical lift: instead of relying only on screenshots, Codex pulls structured text from Apple's accessibility framework — including off-screen elements. That means it doesn't need a multimodal model, which unlocks the fast non-multimodal Codex Spark. Composing and sending an iMessage took seconds.
Because it doesn't require images necessarily, we can use non-multimodal models like Codex Spark which are super fast. And so all of a sudden you have this experience where computer use can use software even faster than you can.
~09:03 Permissions are per-app. The first time Codex touches a new app, it asks; sensitive apps simply stay invisible to it. Mac only at launch, Windows coming.
~07:15 NLW also flagged a same-day Codex Chrome plugin: instead of scraped-connector data, Codex now gets direct browser context in its own tab group while you keep yours. Useful for web devs testing flows, and for non-technical users, multi-tab form-filling.[1]The AI Daily Brief
Claude Code's new Agent View collapses a wall of terminal tabs into a single keyboard-driven dashboard — sessions are yellow when waiting for input, green when done, and you can reply or kill from the overview.[4]Nate Herk | AI Automation A new --bg flag drops a new agent straight into the background view, and a companion /goal command lets Claude Code pursue a long-running objective overnight.
~00:00
Press the left arrow in any session to reveal the dashboard; right arrow to dive in; Ctrl+X twice to kill.
~05:03
claude --bg "build me a 3D monster game" launches an agent directly into the dashboard — handy when you're spawning work across different project directories.
~03:01
The new /goal command sets an autonomous objective, and goal sessions appear in Agent View. Still in research preview, so expect a few rough edges.
Vercel's Nico Albanese frames 2026 agents as three building blocks — runtime, tools, and a computer — and walks through an AI SDK 6 build where a single bash tool plus a persistent named Vercel Sandbox lets the agent write Python scripts into its own filesystem and re-use them later.[5]AI Engineer — Nico Albanese The talk is also a strong contrarian take on context engineering: aggressive compaction kills prompt caching, and his real coding agent ran 104 minutes / 316 tool calls / 29 files with zero compaction at a 95% cache-read ratio.
~07:41
AI SDK 6 introduces toolLoopAgent — define an agent once, get end-to-end typed messages flowing through createAgentUIStreamResponse and the useChat hook to React components via InferAgentUIMessage. The old pattern of 2,000-line Next.js route handlers calling generateText/streamText directly is on its way out.
~17:50
The thesis: runtime + tools + computer.
~18:52
The AI SDK now distinguishes custom, provider-defined, and provider-executed tools (where the LLM provider runs the tool server-side and returns results inline) — demoed with OpenAI's webSearch.
~45:18
Vercel Sandbox now supports named/persistent sandboxes: the filesystem snapshots on idle and rehydrates on the next request, so the agent feels like it's reaching back into the same machine across sessions.
~33:04 Nico argues million-token windows have flipped the math: cache-aware prompt structure beats aggressive compaction, because pruning context invalidates the cache. He reports a 95% cache-read ratio on his personal coding agent and one run that hit 104 minutes / 316 tool calls / 29 files at only 32% of GPT-5.4's window — zero compaction. Where memory is needed, push it off-thread via sub-agents that hand back ~1k-token summaries (Amp's handoff-tool pattern). Auto-compaction is the villain story: a Meta exec lost an inbox when noisy tool calls pushed the original instructions out of context.
It ran for 104 minutes in one single turn… used 316 tool calls, changed 29 files, and it used only 32% of GPT-5.4's context window. I have zero compaction running on this.
~50:22
One bash tool is enough — modern models already know find, ls, grep, glob. Inject the sandbox into the tool via runtime context (Zod-shaped call options) and the agent has both shell and filesystem.
~62:40
A memories.md file is loaded into the system prompt every prepareCall; the agent generates a Python weather script, persists it, and re-uses it next turn. The closing demo is an internal Vercel product built on this stack: 23 users, 3.8B tokens, ~350 PRs, 90–91% cache-read ratio.[5]AI Engineer — Nico Albanese
We feel very strongly that bash is all you need.
Alessandro Cappelli (Adaptive ML) argues that 95% of GenAI pilots stall not in "the last mile" but in the actual marathon from MVP to production — and that reinforcement learning, not prompts or SFT, is the only post-training method that systematically integrates production feedback.[6]AI Engineer — Cappelli, Adaptive ML Customer cases: AT&T, Manulife, and CCS (a medical supply company training on real, sometimes panicking caller transcripts).
~01:14 The case against prompts and SFT: each fix introduces new defects, and SFT data is too expensive to keep iterating on after launch. RL mathematically integrates feedback from business metrics, environment rewards, and judges. ~04:15 Three production unlocks: SFT-quality output from a 10B-class model (Gemma/Mistral/Qwen); tokenomics that close (AT&T was spending "millions" just to summarize agent transcripts); sub-300ms latency for voice support, which large frontier models can't hit. ~07:19 Agents 10× the token spend and raise the stakes — agents do database writes — which is why Adaptive trains Qwen 3.5 against live environments at Manulife. Where no environment exists, both tools and "mock users" are themselves LLMs; CCS's real transcripts ground the simulation.
~11:24 Cappelli is openly skeptical of RLHF-as-marketed: annotation campaigns are expensive and miserable. Use humans to write rubrics, system prompts for LLM judges, and scenarios — a few hours, not weeks. Direct signals (does the code run?) and KPIs (CCS optimizes containment rate) do the heavy lifting; LLM-as-judge handles tone and brand. Adaptive Engine hides the PPO/GSPO orchestration behind recipes — "PPO requires orchestrating not one, but four LLMs at the same time."[6]AI Engineer — Cappelli
Reinforcement learning is disproportionately more effective than instruction fine-tuning, and likewise versus prompting.
Nobody wants to run an annotation campaign. It is either expensive, or it is really useless.
Vincent Koc (Comet, OpenClaw contributor) argues that static benchmarks are calcifying because the harnesses around them — OpenClaw, Claude, Codex — now self-modify.[7]AI Engineer — Vincent Koc His prescription: treat evals as code and as living agents — rubrics, self-curating suites from production traces, telemetry in the loop, Karpathy-style optimization toward a defined end state.
~02:09 Map software engineering onto AI eval: unit tests, regression suites, CI/CD, chaos engineering, observability. AI eval has most of these — it's missing the chaos layer, which is why benchmarks don't survive contact with production. ~04:10 The harness changes itself.
The harness changes itself. Like the harness will shift… that adaption that we're seeing inside of things where software's being shipped at lightning speed, how does your benchmarks keep up with that?
~05:11 Arc of the era: prompt engineering (died ~2023, people still do it), context engineering (RAG + tool calling made agents steerable), and now intent engineering — agents personalized to the user, where the eval target is the user's goal, not a static prompt. ~10:16 Adaptive eval mechanisms: rubric grading, suites self-curated from traces when the customer base shifts, always-on online eval, "telemetry in the loop" so the harness consumes its own errors and cost signals. ~13:17 The 80/20: 80% can be locked down with intent-defined checks, but the 20% that breaks your business is exactly what static benchmarks miss.
People need to start looking at the evals not as this like static data set thing, but actually as like code as like software or as like a living agent.
The Analytics Power Hour hosts swap pre-AI analytics lessons that still hold up — and argue that durable wisdom (good visualization, outcomes vs. outputs, exploration vs. refinement, trust) matters more now that AI makes generating mediocre output trivial.[8]Analytics Power Hour #297
~00:05 Michael's opener: everyone's now been handed obvious AI slop, and what's missing isn't capability but evidence that a person thought about it. ~02:09 Julie's color-randomization Shiny app — born from a wine-thoughts moment (the standalone clip[9]APH clip: wine-thoughts) — still earns its keep because a good visual model clicks in a way an AI one-shot doesn't. ~09:16 Tim's outputs vs. outcomes framing, lifted from Pat Craig at a United Way committee 20 years ago: a soup kitchen counts trays served (output) but the goal is reduced food insecurity (outcome). Julie admits on-air she'd been thinking input/output for years and the outcome version is meaningfully better. ~16:25 Val's pasted-into-50-decks visual from the Optimizely AB Testing book: refinement (a cone narrowing past the real optimum) vs. exploration (branches that find global best). ~22:28 Mo's hot take: in the AI era, "five wildly different MVPs with clear evaluation criteria" beats "one polished chat experience." ~23:30 Michael's close (and the substance of the standalone "messenger matters" clip[10]APH clip: messenger): trust is hard to build and easy to break — the data does not speak for itself.
I'd rather try five wildly different things with AI in a minimal way with clarity on how I'm going to determine whether this is the best bet, than determine we're going to make the best chat experience using the latest LLMs ever and just pursue that and miss it.
Trust is hard to build and easy to break.
Theo walks through Jared (Bun) starting a 960k-LOC Rust port six days ago, with 99.8% of Bun's existing test suite already passing on Linux x64 glibc.[11]Theo - t3.gg The bigger argument is that with Claude Code visibly "falling apart" (Anthropic's own postmortem on reduced default reasoning, the open-claw billing-by-commit-message mess), Bun under Anthropic could enshittify the same way — bugs that don't affect Claude Code's monorepo workflows getting deprioritized.
~00:00 Setup: even Bun fans (Dax of OpenCode) are migrating back to Node — Windows stability, Electron compatibility, no separate-process spawning, and uncertainty under Anthropic. ~03:02 Zig itself: powerful comptime, but not memory-safe, weird community dynamics, and Windows pain. ~05:04 Bun's perf wins are real but oversold outside of package management and bundling — only ~3× on Express (20k → 60k req/sec).
~08:06 Citing William Johnson, Theo links Bun's risk to Anthropic's recent Claude Code regressions: reduced default reasoning effort, a stale-session bug, a prompt change that hurt coding quality, and the "open claw" mess where commit-message content could route billing.[11]Theo - t3.gg
Bun is embeddable in Claude Code. Claude Code appears to be enshittifying. So now I have to worry that Bun could enshittify too.
~10:10
Charlie Marsh's UV uses ~73 unsafe blocks across 350k LOC. Bun's Rust port: 13,044 unsafe blocks across 681k Rust LOC (plus 571k Zig still on the floor), with roughly 2–3× comment density — fingerprints of LLM-generated code.
~11:11
The Bun team isn't writing idiomatic Rust; they're doing a line-by-line port of Zig — the same move the TypeScript-to-Go port made deliberately, and it still came out faster simply by being native.
They aren't really writing Rust. They are writing C++ with Rust syntax.
13,044 calls to unsafe. Hopefully this emphasizes the problem properly.
Tariq Shaukat's Anthropic post arguing agents should output HTML (information density, visual clarity, sharing, two-way interaction) has hit ~10M views. NLW pushes the conversation further: in the agent era, the operator's job is staging the conditions for the agent to produce, not producing the final artifact — and HTML expresses "mixed doneness" (locked / leaning / open) natively in a way markdown can't.[1]The AI Daily Brief
~09:30 Shaukat's five reasons for HTML: density (tables, CSS, SVG), visual clarity (tabs, mobile-responsive), sharing (browsers render natively → higher click-through), two-way interaction (sliders, copy-back-into-Claude prompts), and "it's just more fun." ~11:06 His shift: he's no longer editing these files — Claude is — which kills Markdown's main benefit.
~17:30 Audience? Claude reading → markdown; humans reading → HTML. Lifecycle? Edited many times → markdown; written once → HTML. Horizon? Indexed/lasting → markdown; ephemeral → HTML.
The question isn't markdown versus HTML. It's for this specific document, who reads it, who edits it, and how long does it live? Answer those three and the format picks itself.
~19:00 The deeper shift: pre-AI, your job was "blank page → finished goal as fast as possible." Now most of the workday lives in an in-between space where you maintain the conditions for finishing well. The hard calibration problem is "mixed doneness":
If you've overspecified, you kill the agent's range and you take away the things it does better than you. If you stay too vague, the agent can flail or produce generic output or ask endless clarifying questions. The new skill is in calibrating how much structure to impose so that the unstructured remainder is something the agent can actually productively resolve.
~24:30 HTML helps because tabs, progressive disclosure, side-by-side cards, color-coded status, and annotations all encode mixed doneness natively — instead of relying on parenthetical caveats inside prose.
Nate B Jones lays out the six layers of an agentic purchase (discovery, authorization, credential, payment rails, governance, liability) and the six protocol camps fighting over them — ACP (OpenAI/Stripe), UCP (Shopify/Google), AP2 (Google), the card networks (Visa/Mastercard/PayPal), stable-coin rails (Coinbase x402, Stripe MPP), and AWS Bedrock AgentCore as the enterprise runtime.[12]Nate B Jones — agentic commerce
~02:00 Agentic commerce breaks the old web's bundled evidence (session + page + click). The question shifts from "can the customer pay" to "how does everyone know the agent was allowed to do this." That single shift touches identity, fraud, credentials, settlement, refunds, liability, and data rights.
~03:03 OpenAI/Stripe's ACP answers agent-surface checkout cleanly — merchant stays merchant-of-record but loses discovery, ranking, and brand presentation inside ChatGPT. Shopify/Google's UCP preserves merchant rules, inventory, loyalty, returns across agents. Different questions, different power centers.
~06:04 Stripe's Approved Payment Link gives an agent a one-purchase token; Google's AP2 gives a longer-lived mandate (scope, constraints, proof of user approval). Mastercard Agent Pay, Visa Intelligent Commerce, and PayPal are all chasing the trusted-transaction layer rather than the recommendation layer. ~09:05 For software-to-software micropayments — agents paying APIs, model calls, per-task SaaS — Coinbase's x402 revives HTTP 402 ("payment required") so payment is part of the web request itself, and Stripe's MPP plus Bridge/Privy/Tempo supplies the wallet stack.
~12:07 AWS Bedrock AgentCore Payments — built with Coinbase and Stripe — positions AWS as the governance runtime. It doesn't need to own a payment rail because it owns the environment where agents actually run, with task, policy, budget, and history.[12]Nate B Jones — agentic commerce
A payment provider just sees the payment. The agent platform sees all the work around the payment and has a lot of leverage long term.
~15:12 Nate's framing: every camp picks a layer to take real responsibility at. Companies that can't define identity, permission, payment, settlement, refunds, and liability "aren't ready to let agents transact."
TSMC posted its slowest six-month sales growth (17.5% YoY in April, roughly half consensus) — not on demand softness but on physical fab capacity. Apple signed a preliminary deal with Intel, ending TSMC's exclusivity; AMD and Intel jumped ~25% on the week vs Nvidia's +8%. Meanwhile PulteGroup is piloting Nvidia/Span micro data centers attached to new homes.[1]The AI Daily Brief
~03:45 TSMC's slowdown is mostly two things: weak non-AI business (consumer electronics, smartphone chips hit by memory cost spikes) and the AI side hitting a wall on fab capacity. HBM supply is also constrained.
TSMC is sold out. There's no choice.
~05:01 Apple's preliminary chip deal with Intel — Commerce Secretary Lutnick pushed it post-government-investment — could be either lower-end iPad/iPhone chips or M-series. Mizuho's Jordan Klein called it a "changing of the guard." Memory suppliers were the biggest winners. ~06:15 Micro data centers on houses. PulteGroup is testing with Nvidia and Span — micro DCs on home exteriors as nodes in a distributed cluster. For batch and non-time-sensitive workloads, the home environment "works surprisingly well."[1]The AI Daily Brief
OpenAI's 8-week Parameter Golf challenge — minimize held-out FineWeb loss in <16MB and 10 minutes on 8×H100 — drew 1,000+ participants and 2,000+ submissions, with winning techniques spanning Muon weight decay, GPTQ post-training quantization, per-document LoRA at test time, CaseOps tokenizers, and an Exclusive Self Attention (XSA) variant.[13]OpenAI The unexpected lesson: coding agents reshape competitive ML — submission volume forced OpenAI to build an internal Codex-based triage bot, and agents propagated invalid techniques to other participants when a flawed submission scored well.
Record-track techniques clustered around four areas: training optimization (Muon, spectral embedding init, residual-mix scheduling), post-training quantization (GPTQ-lite through full-Hessian GPTQ), test-time strategies (per-document LoRA, self-generated GPTQ calibration), and novel modeling (CaseOps tokenizer, XSA, SmearGate/BigramHash, recurrent layer reuse). The non-record track was more experimental — half beat the 1.22 BPB naive baseline; top reached 1.12 BPB.
The vast majority of submitters mentioned using agents as part of their work.
When submissions that fell outside the competition guidelines produced unusually strong scores, other agents sometimes copied those ideas and continued down the same invalid path.
The internal Codex-based triage bot monitored submissions and flagged them for human review — critical on days with hundreds of submissions. OpenAI explicitly framed Parameter Golf as a talent discovery mechanism.[13]OpenAI
Matt Pocock dropped a Skills changelog: /handoff compacts a session into a temp doc so a fresh agent can pick up the intent (including suggested next skills); /prototype spits out several radically different UI variations or an interactive terminal app for stateful logic; /review spawns two parallel sub-agents — one for repo standards, one for spec/PRD fidelity.[14]Matt Pocock Real Python's same-day explainer is a clean primer on what skills actually are.[15]Real Python
~00:00
Writes a compact doc to /tmp so a second agent can continue without restarting. Captures intent and vibe — if you were mid-grilling, the next agent should continue grilling. Two usage patterns: fire-and-forget (new context window for a side bug) and DIY sub-agent (do work elsewhere, return results). The escape hatch from sub-agent constraints: full context window, no parent-imposed limits.
~03:01 For UI, it generates several variations in the correct route with a floating toggle to switch between them — taste decisions need a human in the loop because the agent can't see what it's building. For business logic, it builds a tiny interactive terminal app that walks the state machine through edge cases that are hard to reason about on paper.
~07:07
grill-with-docs was too eager to implement; wrapping supporting info in XML tags signaled lower priority and reduced premature implementation. to-PRD and to-issues now label "ready for agent triage" instead of "needs triage."
~09:07
In flight: /writing-fragments (dictation), /writing-beats (three arcs), /writing-shape (final pass to remove AI-ness). /review spawns parallel sub-agents on two axes — standards vs spec — because checking only one consistently misses the other.
Skills are basically a lightweight way for you to guide your model to do certain things.
Tech Brew tallies the AI-employee wealth wave: 600+ OpenAI workers cashed out $6.6B in secondaries (avg ~$11M, capped $30M), 75 became overnight multimillionaires, and an investment banker reportedly offered a $4.8M Bay Area estate in exchange for Anthropic equity.[16]Tech Brew — AI lottery Morning Brew tracks the public-market mirror: Intel +239% YTD, Sandisk +558% YTD, Kospi +78% YTD as South Korea overtakes the UK and Canada in market cap.[17]Morning Brew
OpenAI is currently valued at $852B with a potential $1T+ IPO target; senior salaries already pass $500K. Meta has responded with retention packages up to $300M for top researchers, and Bay Area housing is up 14% in the past year — partly attributed to the new AI-money class.[16]Tech Brew — AI lottery
A new class of haves and have-nots for the AI era—and it won't just be the people running the firms.
Concentration risk in the AI stock tear: electric and electronic equipment now exceeds half the Kospi, with Samsung and SK Hynix alone >40%. Bloomberg's Jonathan Levin warned it's becoming a "single-industry index." Dot-com parallels are flagged (best week for the PHLX Semiconductor index since the week of March 10, 2000) — but earnings are real: Micron is expected to finish 2026 at $77B in operating profit, and Samsung's Q1 OpEx jumped 8× and already eclipsed all of 2025.[17]Morning Brew
The party is best about a half-hour before the police shut it down.
Sherwood Snacks' "Apple's Big AI Play Is, Well, Mini" was queued in the source set today[18]Sherwood Snacks but the destination URL appears to redirect to a domain-for-sale page at fetch time; the headline points at Apple's small-model, on-device AI posture relative to frontier-scale competitors.
Nate Herk's 21-minute survey ladders through five levels of Claude proficiency — Enthusiast → Beginner → Intermediate → Advanced → Architect — and treats each as a ceiling with a "cheat code" to break through. The headline takeaways: real Excel/PowerPoint file creation is free on every plan; Claude Code's /compact, /context, work-trees, sub-agents, and plan-mode-with-Opus/Sonnet cut cost in half; and at Level 5, the real bottleneck isn't capability — it's trust.[19]Nate Herk — Every Level of Claude
~01:01
Level 2 (Beginner): projects, memory + past-chat search (memory free, search paid), 50+ OAuth connectors, native Office add-ons across Excel/PowerPoint/Word (April 2026), and artifacts with persistent storage that call Claude's API directly and publish public links.
~04:02
Level 3 (Intermediate): Co-work in an isolated VM with read/write access; Claude Design as a "Figma killer" that reads brand context from GitHub and packages handoffs for Claude Code/Canva; scheduled tasks via /schedule; mobile pairing via Dispatch.
~09:03
Level 4 (Advanced): keep CLAUDE.md <200 lines and update it every time Claude makes a mistake; plan-mode (Shift-Tab twice) with hidden Opus-Plan setting (Opus plans, Sonnet executes); sub-agents with isolated contexts; work-trees (3–4 is the sweet spot); MCP caveat — CLI tools use 60–70% fewer tokens than equivalent MCP servers, and tool-search auto-defers MCP when overhead crosses 10% of context. Boris Churnney's verification loop (Claude Code + Chrome extension for browser testing) reports 2–3× quality gains.
~15:06
Level 5 (Architect): laptop closed. Cloud Routines (saved Claude Code configs on Anthropic's cloud, triggered by schedule/API/GitHub event), lifecycle hooks (pre-tool-use / post-edit / stop), and channels into Discord/Telegram/iMessage. Plus autodream (background memory consolidation), task budgets on Opus 4.7 (beta), and experimental agent teams coordinated via a lead agent — MCP for tools, Google A2A for agent-to-agent.
~20:07 The Level 5 stall is trust, not technology — start with low-stakes internal routines, watch for weeks, scale up.[19]Nate Herk — Every Level of Claude
In a Lobsters thread about Redis's homepage rebrand, Mitchell Hashimoto argues most Technical Decision Makers don't browse Lobsters or push to GitHub on weekends — they're 9-to-5s motivated by not getting fired, so they anchor to whatever Gartner and McKinsey have blessed. Which makes "Context Engine for AI Apps" a perfectly rational enterprise rebrand even if it's technically confusing.[20]Simon Willison — Hashimoto quote
The thing about 90% of TDMs is that they're motivated primarily by NOT GETTING FIRED. These aren't people who browser Lobsters or push to GH on the weekend. These are people that work 9 to 5, get paid, go home, and NEVER THINK ABOUT WORK AGAIN. So to achieve all that, they follow secular trends supported by analysts and broad public sentiment. Oh, Gartner said that "AI strategy" is most important? McKinsey said "context" needs to be managed? Well, "Context Engine for AI Apps" is going to be defensible. Buy it.
Simon surfaced Mo Bitar's "Unethical Guide to Surviving AI Layoffs" — deadpan advice that if your CEO has never heard the phrase you drop ("Ralph Loop"), you are less than 30 days from a promotion. The bit lands because in orgs where AI anxiety is high and depth is thin, the appearance of expertise outpaces actual expertise.[21]Simon Willison — Mo Bitar
Now, if your CEO has never heard the phrase Ralph Loop, oh man, you are less than 30 days away from your next promotion.
A two-release day from Simon: Datasette 1.0a29 fixes a segfault from race conditions between concurrent Datasette.close() and in-flight queries (he reproduced it with a minimal Codex-CLI-built Dockerfile)[22]Simon Willison — datasette 1.0a29, plus mobile-Safari and zero-row table-header fixes. llm 0.32a2 adds OpenAI's /v1/responses endpoint for interleaved reasoning across tool calls on GPT-5-class models, and surfaces summarized reasoning tokens with a new -R / --hide-reasoning flag.[23]Simon Willison — llm 0.32a2
Datasette 1.0a29: TokenRestrictions.abbreviated(datasette) utility (issue #2695), zero-row table-header visibility (#2701), column-actions dialog on mobile Safari (#2708), and the segfault fix (#2709). Simon notes Codex CLI with GPT-5.5 xhigh wrote the minimal Dockerfile reproduction that pinned the bug.[22]Simon Willison — datasette 1.0a29
llm 0.32a2: /v1/responses support enables interleaved reasoning across tool calls; reasoning tokens render in a distinct color separate from stderr, with -R/--hide-reasoning to suppress.[23]Simon Willison — llm 0.32a2
Three drops in one day from Prefect: infrastructure decorators that bind a Python flow to (e.g.) a Kubernetes work-pool with one annotation, no deployment setup[24]Prefect — infrastructure decorators; Ramp's ML team migrating 200+ workflows off Metaflow in 90 days (350 flows by month three; commit volume in the last 6 months exceeded all prior history)[25]Prefect — Ramp migration; and a dbt Orchestrator that breaks a dbt DAG into individual Prefect tasks for per-node retries and cross-job caching, so a failed run doesn't restart from scratch.[26]Prefect — dbt orchestrator
~00:03
A function with the Kubernetes decorator is serialized into a bundle, uploaded to S3, and pulled down by a K8s job to execute. Results come back to S3 and return to the caller — so the flow is invoked like an ordinary Python function. Decorator args set namespace/CPU/memory and persist to the Prefect server for auditability.
~05:08
include_files ships a local dbt project or config YAML to the remote.
~07:08
A .submit() non-blocking method enables fan-out/fan-in across heterogeneous infrastructure.
~00:00 Staff ML engineer Ryan Carbone says Prefect won on three axes: vendor momentum, developer experience, and whether the problem is unique to Ramp (it's not). Prototype-to-production needed almost no refactoring; the infrastructure decorators bridged local and remote with production-level permissions and bigger compute than a laptop. Non-technical users now self-serve via the UI — write, deploy, schedule, retry, allocate. 350 flows by month three.
Being opinionated matters but it should be like the company doing that right or your ML platform team doing that, not the framework.
~00:04
The Orchestrator breaks a dbt build DAG into individual Prefect tasks, running dependency-ordered waves with parallel execution. The plan method previews the graph (26 nodes / 6 waves in the demo). When a schema break is introduced, only the failing node retries; downstream nodes that already completed are served from cache.
~05:11
Once the schema is restored, all 13 materialization nodes read from cache (near-instant) while 13 test nodes run against live data — the kind of cross-job caching that compounds when daily and hourly dbt jobs share datasets.[26]Prefect — dbt orchestrator
Better Stack covered two open-source releases worth tracking. Zero Native is a Zig shell that hosts a system web view via a JSON bridge — React/Svelte/Vue/Next/Vite front-ends, 2.9MB binaries (some users under 1MB), hot reload via zig build dev.[27]Better Stack — Zero Native Paperclip is the management layer above CrewAI/LangGraph/AutoGen: org charts, tickets with ancestry, budgets, heartbeats, audit logs — npx paperclip-ai onboard spins up a local Postgres-backed dashboard and you ship a URL-shortener MVP through CTO + engineer agents.[28]Better Stack — Paperclip
~03:01 Electron requires a Bun (or Node) runtime in the main process and FFI through C++/Obj-C layers. Zero Native ships only the Zig binary, calling OS APIs and C libs directly. The tradeoffs: you may write a little Zig or edit ZON config, and some Electron niceties (custom title-bar styling, menu items) aren't supported yet. iOS and Android on the roadmap.
~00:00 The pitch: without structure, multi-agent setups overwrite each other, lose ownership, and rack up the bill. Paperclip is the manager, the org chart, the ticket board, the budget system, the audit log. ~06:04 Honest limitations: vague skill/rules definitions lead to spurious tickets, token burn stays real even with budgets if prompts are sloppy, and for a one-agent task the overhead is overkill.
If your skill MD files suck, your company behaves like a confused startup.
AICodeKing walks through On Demand — 400+ agentic tools, a multi-agent playground, and a no-code flow builder with BYOM[29]AICodeKing — On Demand. The marimo VSCode extension now runs reactive notebooks (interactive tree maps, any-widgets) inside VSCode's notebook surface.[30]marimo — VSCode DLAI shipped both an AI Dev 2026 SF recap (~3,000 attendees at Pier 48)[31]DLAI — AI Dev SF recap and "Transformers in Practice" with Sharon Joe and AMD — a course aimed at engineers debugging OOM, slow inference, and hallucinations.[32]DLAI — Transformers in Practice Nate B Jones' 60-second take on careers: the rare role is the AI-fluent generalist who can show executives "I've tested this; here's what AI can/can't do; here's the plan, budget, and timeline."[33]Nate B Jones — rare generalist And on the physical-world side, ATI's SmartBay automates tire changes in a standard 12-foot bay — one tech overseeing three bays, pitched against a 37,000-tech shortage and EV tires wearing 30% faster.[34]Tech Brew — SmartBay
On Demand: marketplace → playground (BYOM, multi-agent parallel) → Flow Builder (scheduled/webhook/API triggers, outputs to Slack/email). Layered abstraction for repeatable customer-feedback or recruiting workflows. ~01:02 400+ tools combining into ~1,200 agent configurations.[29]AICodeKing — On Demand
marimo in VSCode: a fashion-items dataset clustered via embedding becomes an interactive tree map — hover a node, the image grid updates in real time, all without leaving the editor.[30]marimo — VSCode
Transformers in Practice: token-by-token generation, attention internals, GPU optimization on AMD hardware, with interactive visualizations you can manipulate.[32]DLAI — Transformers in Practice
I've tested this. Here's what AI can actually do in our actual workflow. Here's what it cannot do. Here is the implementation plan. Here's the budget, and here is the timeline.
ATI SmartBay: founded by 4th-generation tire-industry veteran Andy Chalofsky; robotics + ML + computer vision in a standard 12-foot bay, with EV tire wear cited as a tailwind.[34]Tech Brew — SmartBay
Three short clips worth keeping. Luca di Montezemolo says the Ferrari 348 is the worst car Ferrari ever made — "no personality, no technology… I was getting beat off the line by Volkswagen Golfs" — and blames Fiat's parts-sharing cost cult.[35]Acquired — Ferrari 348 Sequoia drops a forestry metaphor for the manager-to-executive jump: stop stamping out fires, start managing the territory.[36]Sequoia — manager vs executive And David Reich (Dwarkesh) cautions against the "humans are getting dumber" reading of the Icelandic dysgenic-schooling study: control for age-at-first-child and the signal disappears.[37]Dwarkesh — David Reich
Everything was missing. It had no personality. It had no technology. It wasn't state of the art in anything. It had no power. It was all missing.
You need to be focused more on thinking about this whole pie and what has to be true 24 months from now so that this whole territory is in better shape.
Reich's structural point: the polygenic score for "years of schooling" correlates strongly with age at which women first have children, and also with BMI. Control for one and the schooling signal vanishes — meaning the genetic predictor may be picking up delayed gratification or long-term planning rather than intelligence per se.
If you control for that for numbers of years of schooling, all of the signal of years of schooling goes away.