May 20, 2026
Google I/O day two dropped Gemini 3.5 Flash and a sprawling agent stack, but third-party benchmarks and hands-on tests landed somewhere between unimpressed and hostile. Artificial Analysis pegs the new Flash at roughly 5.5x the real cost of Gemini 3 Flash and trailing several cheaper coders [1]Better Stack — Gemini 3.5 Flash is just... fine, AICodeKing calls Antigravity 2.0 and 3.5 Flash “dead on arrival” [2]AICodeKing — Antigravity 2.0 & Gemini 3.5 Flash (Fully Tested): SO BAD! FINAL NAIL IN THE COFFIN FOR GOOGLE., Theo records a 22-minute crash-out flagging hidden pricing and a Fish-Slap agent eval where 3.5 Flash fails outright [3]Theo - t3.gg — I'm scared to make this video, and Simon Willison — a Gemini fan in normal times — explicitly tags most of I/O as vaporware [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. Google's own DeepMind video still pitches the 4x faster output tokens and GDP-bench wins [5]Google DeepMind — Gemini 3.5 Flash has landed., and Nathaniel Whittemore frames the whole event as confused and unfocused [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O.
Google's headline pitch was “better than 3.1 Pro on nearly every benchmark + 4x faster output tokens” — Better Stack confirms the throughput claim at 278 tokens/sec [1]Better Stack — Gemini 3.5 Flash is just... fine but pulls Artificial Analysis numbers showing 3.5 Flash actually costs ~5.5x Gemini 3 Flash per task because it spends more tokens to get there, and trails Haiku 4.5 and GPT-5.1 Mini on coding. Theo's ~03:03 segment digs into the token-bloat math and points out that Google buried the real per-token price in fine print [3]Theo - t3.gg — I'm scared to make this video; his ~08:05 Fish Slap agentic test has 3.5 Flash failing while GPT-5.5 ships in the same harness.
AICodeKing's hands-on at ~05:10 benchmarks the model against Sonnet, GPT-5.5, and Haiku 4.5 on practical coding and concludes Flash gets beaten across the board [2]AICodeKing — Antigravity 2.0 & Gemini 3.5 Flash (Fully Tested): SO BAD! FINAL NAIL IN THE COFFIN FOR GOOGLE., and at ~06:10 recommends staying on those alternatives.
It's mid at best on every real coding test I ran — and yet they want $200/month for the Antigravity tier. Just no.
Nathaniel Whittemore reads I/O as a slate of features that simply did not have a focal point — Gemini Omni, Spark, Antigravity, and Flash all launched the same day with no clear flagship [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O, while Google DeepMind's own launch video leans hard on the GDP-bench wins and the latency story [5]Google DeepMind — Gemini 3.5 Flash has landed..
Simon Willison — whose policy is to only write about features he can personally test — admits I/O 2026 fell largely outside that scope because so much was preview or gated. The one clearly-released piece (3.5 Flash) he covers in a separate post; everything else is “coming soon” [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity.
Tech Brew calls out what Whittemore independently makes the same point about: nobody at Google appears to own a coherent naming taxonomy. Gemini sits inside AI Studio sits inside AI Mode sits next to AI Overviews; Spark sits inside Gemini, runs on Flash, runs on Antigravity. Even Google's own keynote presenters seemed unsure whether Spark was a feature or a product [7]Tech Brew — Google it (while you still can).
Even Google's own keynote presenters seemed unsure whether Spark was a feature of Gemini or a separate product. The naming problem is now a strategy problem.
Simon Willison also highlighted Mike Veerman's interactive tool that simulates LLM token output from 5 to 800 tokens per second — a useful companion piece when product and design audiences need to actually feel what Google's “4x faster output tokens” claim translates to on screen [8]Simon Willison's Weblog — How fast is 10 tokens per second really?.
Google launched Gemini Spark as its answer to a personal AI agent, with native hooks into Gmail, Calendar, Drive, Docs, Sheets, Slides, YouTube, and Maps, running on Gemini 3.5 Flash and the new Antigravity platform [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. Tech Brew frames it as a “24/7 personal agent” positioned squarely against OpenAI and Anthropic's consumer agent pushes [7]Tech Brew — Google it (while you still can), and Whittemore reads the product as confusingly scoped — it does a lot, but isn't clearly Pro-tier or free-tier [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O. Willison flags the prompt-injection surface area as a potential “Challenger disaster” for agent security.
From the FAQ Willison surfaces: Spark is pitched as “your personal AI agent” that “connect[s] natively with your favorite Google apps like Gmail, Calendar, Drive, Docs, Sheets, Slides, YouTube, and Google Maps” and explicitly runs “on Gemini 3.5 Flash and Antigravity” [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. Every task executes in a “fresh, strictly isolated, ephemeral VM,” with traffic gated through a secure Agent Gateway enforcing DLP policies.
Willison's read: an agent that can read your inbox and act on your behalf is precisely the system adversarial content in those inboxes is designed to hijack [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. He doesn't soften the framing.
An agent that can read your email and calendar and act on your behalf is exactly the kind of system that adversarial content in those sources could hijack.
Tech Brew's coverage at the company-strategy level positions Spark as Google trying to occupy the consumer-agent slot before OpenAI ships ChatGPT Agent 2 and Anthropic ships Co-work as default [7]Tech Brew — Google it (while you still can). Whittemore at ~12:06 reads the launch as “a confusingly positioned 24/7 personal agent” [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O.
Gemini Omni is the day's most genuinely novel piece: an any-to-any multimodal model that combines Gemini's reasoning with VEO, Nano Banana, and Genie to do conversational video editing rather than competing head-on with Sora 3 [9]Google DeepMind — Build your next story with Gemini Omni.. Whittemore reads it as “a Nano-Banana moment for video editing,” not a V4 contender [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O. Patrick Löber's AI Engineer talk on the same architecture explains why native multimodal generation matters — the model can ground audio in the same world-knowledge the language model uses, instead of routing through text [10]AI Engineer — Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind.
The DeepMind launch video pitches Omni as a single model that takes any input and produces any output, internally orchestrating VEO (video), Nano Banana (image), and Genie (worlds) under Gemini's planner. The headline demo is conversational video editing — “make the dog turn around,” “swap the time of day” — without re-rendering from scratch [9]Google DeepMind — Build your next story with Gemini Omni..
Löber's talk argues the unlock of any-to-any models is that media generation gets access to the LM's world knowledge instead of being a downstream call ~11:25. The Live API at ~13:28 demonstrates the same logic for audio: a single audio-to-audio architecture for real-time interaction rather than ASR+LM+TTS. He also surfaces multimodal embeddings and Gemma 4 local agents at ~15:00 [10]AI Engineer — Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind.
Whittemore at ~09:05 sees Omni as the right framing — Google is winning at editing creative media, not at generating it from a blank canvas, and Omni leans into that [6]The AI Daily Brief: Artificial Intelligence News — The Most Important AI News from Google I/O.
Antigravity 2.0 is Google's new agentic runtime — desktop app, Go-based CLI, open-source Python SDK wrapper, and a VS Code fork IDE — and it ships with the announcement that the open-source TypeScript Gemini CLI gets shut down June 18, 2026 in favor of the closed-source Antigravity CLI [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. Theo's hands-on at the CLI is brutal: bugs in auth, sub-agents that look like Codex clones, and a launch that smells dysfunctional even by Google standards [3]Theo - t3.gg — I'm scared to make this video. AICodeKing concurs on the desktop app — Codex clone with UX regressions — and recommends sticking with existing tools [2]AICodeKing — Antigravity 2.0 & Gemini 3.5 Flash (Fully Tested): SO BAD! FINAL NAIL IN THE COFFIN FOR GOOGLE..
Willison's recap: Antigravity is a desktop app, a CLI written in Go, an open-source Python SDK wrapper, and a VS Code fork [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity. The platform runs on Google Cloud with ephemeral VMs, an Agent Gateway, DLP, and encrypted credentials.
The existing Gemini CLI — Apache 2.0 TypeScript — stops working on June 18, 2026. Its replacement is the closed-source Antigravity CLI. Willison notes this open-to-closed reversal without enthusiasm; developers get roughly a month to migrate [4]Simon Willison's Weblog — Google I/O, Gemini Spark, Antigravity.
Theo at ~10:06 walks through bugs in the Antigravity CLI's authentication and basic UX, and at ~20:09 argues that Antigravity is straight-up copying Codex — sub-agents, planning panels, the works [3]Theo - t3.gg — I'm scared to make this video.
AICodeKing's CLI walkthrough at ~02:08 hits the same auth issues, and the desktop-app review at ~03:09 calls it a Codex clone with regressions [2]AICodeKing — Antigravity 2.0 & Gemini 3.5 Flash (Fully Tested): SO BAD! FINAL NAIL IN THE COFFIN FOR GOOGLE.. Better Stack at the model level adds that Antigravity 2 is positioned as a Codex competitor but ships with all the rough edges Codex shed eight months ago [1]Better Stack — Gemini 3.5 Flash is just... fine.
Tech Brew bundles the not-Gemini parts of I/O: the first major Google Search redesign in 25 years (stacked-card AI-first layout), a YouTube Search overhaul with conversational interface, and a smart-glasses push co-branded with Warby Parker and Gentle Monster [7]Tech Brew — Google it (while you still can). Google Research separately announced Beam group-meetings with life-size remote attendees on HP Dimension — an experiment showing measurable presence improvements [11]Google Research — A new experiment brings better group meetings to Google Beam.
Tech Brew describes the new layout as a stacked-card interface where AI Overviews sit above results, with the traditional 10 blue links pushed down. The rollout is gradual and US-first [7]Tech Brew — Google it (while you still can).
The YouTube redesign treats search as a conversation — query expansion, follow-up questions, and timestamped clip retrieval inside the search bar [7]Tech Brew — Google it (while you still can).
Co-branded hardware with Warby Parker and Gentle Monster is the bet that prescription-friendly, design-led frames are the only way smart glasses become daily-driver [7]Tech Brew — Google it (while you still can).
Google Research separately announced that Google Beam — the rebranded Project Starline — can now match remote participants to true-to-life size on HP Dimension's immersive display, eliminating the “tiny head on a screen” cue that makes remote attendees feel optional. Early enterprise pilots show measurable improvements in perceived presence and participation [11]Google Research — A new experiment brings better group meetings to Google Beam.
An internal OpenAI reasoning model autonomously disproved the Erdős unit distance conjecture — a central open problem in combinatorial geometry that had stood roughly 80 years — using techniques from algebraic geometry. The result has been independently verified [12]OpenAI — An OpenAI model has disproved a central conjecture in discrete geometry. OpenAI's promo video features the researchers walking through how the model produced the proof end-to-end [13]OpenAI — The Erdős Breakthrough. Last Week in AI's coverage frames this as the most significant “model does real math” moment since AlphaProof [14]Last Week in AI — Last Week in AI #245 - TML-Interaction, Claude For Legal, Sam Altman on Stand.
The unit-distance conjecture bounds how many pairs of points in n points in the plane can be at unit distance from each other. OpenAI's model produced a construction that exceeds the conjectured bound — i.e., a disproof rather than a proof — and used tools from algebraic geometry to build it [12]OpenAI — An OpenAI model has disproved a central conjecture in discrete geometry.
The Erdős problem is not in the AlphaProof / IMO style of well-defined competition mathematics: it's a famous open problem from a famously prolific problem-poser. The proof itself is novel enough that humans had to study it to verify. OpenAI's video at ~00:00 opens with the researchers explaining their own surprise [13]OpenAI — The Erdős Breakthrough.
SpaceX's SEC S-1 filing inadvertently disclosed a $1.25-billion-per-month cloud services agreement with Anthropic, running through May 2029 and using the COLOSSUS and COLOSSUS II clusters [15]Simon Willison's Weblog — Quoting SpaceX S-1. The deal is large enough that Last Week in AI dedicates a 14-minute segment to its implications, including how it slots alongside Anthropic's existing Google $40B Anthropic investment and the broader neocloud restructuring [16]Last Week in AI — Last Week in AI #244 - GPT-5.5 Instant, Grok 4.3, OpenAI vs Musk.
Buried in SpaceX's S-1: a multi-year compute contract with Anthropic at $1.25B/month, through May 2029, running on COLOSSUS and COLOSSUS II. Willison points out the obvious — disclosure was almost certainly unintentional, and the run rate puts Anthropic compute at well over $15B/year just from this one provider [15]Simon Willison's Weblog — Quoting SpaceX S-1.
LWIA #244 at ~51:50 connects the dots: Anthropic's compute mix now spans Google ($40B investment), AWS Bedrock, and SpaceX (COLOSSUS); the orbital-data-center angle is no longer pure science fiction [16]Last Week in AI — Last Week in AI #244 - GPT-5.5 Instant, Grok 4.3, OpenAI vs Musk.
If Anthropic is paying $1.25 billion a month for compute on COLOSSUS, the next question isn't whether scaling laws still hold — it's who can actually finance the next decade of training.
Anthropic acquired Stainless for $300M — the API/SDK company behind OpenAI, Anthropic, and other major LLM providers' developer experiences. Every's Dan Shipper interviews founder Alex Rattray on what the deal means for MCP, agent UX, and the “dendrites of the internet” framing [17]Every — Anthropic Just Bought a Dev Tools Startup for $300M. Here's What Its Founder Told Me.. LWIA #245's segment also notes Stainless's role in fixing MCP's current scaling problems [14]Last Week in AI — Last Week in AI #245 - TML-Interaction, Claude For Legal, Sam Altman on Stand.
Rattray opens at ~04:15 by reframing APIs as “dendrites of the internet” — the connective tissue that lets agents reach beyond their model context [17]Every — Anthropic Just Bought a Dev Tools Startup for $300M. Here's What Its Founder Told Me.. At ~09:18 he explains why MCP isn't working well at scale: the protocol assumes per-tool descriptions, but the descriptions themselves don't compose. At ~11:20 Shipper teases the “refund my stripey socks across 5 apps” demo as the agentic North Star.
The Every read: Stainless is one of the few companies sitting on the actual taxonomy of how thousands of APIs differ, plus the tooling to normalize them. For Anthropic, that's a foundational lever for Claude Co-work and the agent SDK ecosystem [17]Every — Anthropic Just Bought a Dev Tools Startup for $300M. Here's What Its Founder Told Me..
Meta is laying off roughly 7,800 employees — about 10% of its ~78,000-person workforce — while simultaneously redirecting 7,000 remaining employees into AI-related initiatives and closing 6,000 open non-AI reqs. The framing is explicit: capex and headcount alike are being recomposed toward AI [18]Morning Brew — Meta lays off 10% of workforce.
Morning Brew's read: this is not a cost-cut, it's a portfolio rebalance. The reduction in non-AI hiring (6,000 reqs closed) is almost as large as the layoff itself, and the 7,000 internal AI redeployments make the net effect a near-zero headcount change with a different mix [18]Morning Brew — Meta lays off 10% of workforce. The deeper signal is that Meta is treating its own non-AI org chart as the largest available source of AI engineers.
Nate B Jones argues the companies that actually decide whether AI agents ship are not OpenAI or Anthropic — they're the seven infrastructure providers controlling runtime, identity, data, payments, observability, the kill switch, and the cross-cutting governance framework [19]AI News & Strategy Daily | Nate B Jones — These 5 Infrastructure Giants Secretly Rule AI. The piece doubles as a pre-launch worksheet: any of those seven still TBD is a production blocker. Data Science Weekly's headline this week makes the adjacent argument from the other direction — tools, not bigger models, are the missing ingredient for production-grade agents [20]Data Science Weekly — Tools: Why AI Agents Need Them.
Nate maps each layer to specific incumbents at ~02:02 through ~13:07: runtime → Cloudflare Agents SDK, AWS Bedrock Agent Core, Vercel AI Gateway; identity → Auth0 (Ozero), Okta for AI agents, Microsoft Entra Agent ID, WorkOS; data governance → Snowflake Cortex, Databricks Mosaic AI; payments → Stripe, Visa, Mastercard, Amex; observability → Datadog LLM, Langsmith, Braintrust, Langfuse; kill switch → layered across runtime/identity/gateway/payment/framework (e.g., LangGraph interrupts) [19]AI News & Strategy Daily | Nate B Jones — These 5 Infrastructure Giants Secretly Rule AI.
The companies that decide whether your agent is successful are the ones that are building the layer that determines whether agents can act.
At ~13:07 he closes with the kill-switch insight that telling the model to stop is never the kill switch.
If the only way to tell your agent to stop is to just tell the model to stop, you don't have a kill switch.
Andrey Kurenkov and Jeremy Harris cover OpenAI's GPT Realtime 2 stack, Thinking Machines' surprise TML-Interaction Small launch, Anthropic's vertical push with Claude for Legal, Musk v. OpenAI testimony, Anthropic's “Teaching Claude Why” alignment research, Jack Clark's prediction of fully automated AI R&D by 2028, and METR's horizon evaluation putting Claude Mythos at a ~16-hour task horizon [14]Last Week in AI — Last Week in AI #245 - TML-Interaction, Claude For Legal, Sam Altman on Stand.
The episode opens on real-time voice as a newly contested frontier ~03:12: GPT Realtime 2 ships powered by GPT-5 with 128k context, tunable reasoning effort, and a real-time Whisper variant. Days later, Thinking Machines (Mira Murati's lab, quiet since February) drops TML-Interaction Small at ~12:15: a 276B-parameter MoE targeting ~400ms end-to-end latency with a system-1/system-2 architecture, bitwise-aligned training/inference, custom MoE kernels, and persistent GPU sessions.
The middle section ~24:29 is dominated by Anthropic's Claude for Legal — a GitHub-distributed bundle of agent skills, MCP connectors (Docusign, Ironclad, iManage, LexisNexis, Everlaw, Box), and partnerships with Harvey, Lora, the Free Law Project, and the Justice Technology Association. Anthropic disclosed that legal is now the #1 power-user job in Claude Co-work with 3x the usage of any other function. Jeremy spends a chunk of the segment on TSMC-vs-Amazon-Basics: is Anthropic a neutral platform or will it Amazon-Basics Harvey's $11B valuation?
It's the knockoff shoe... applied to AI.
At ~44:48 Musk v. OpenAI testimony: Altman, Ilya Sutskever, and others on the stand.
Ilya's responses were praised for their depth of reasoning while Elon received praise for his low latency and high batch size.
At ~57:59 a wild segment on a Chinese gray market reselling Claude API access at 90% off — “the knockoff shoe applied to AI.” Then at ~71:07 Anthropic's “Teaching Claude Why” research: training on ethical reasoning generalizes far better than training on specific behaviors.
Show me the why and not the what to get that generalization.
The final stretch covers Jack Clark's 2028-automated-AI-R&D prediction ~78:11, METR's audit of Anthropic's automated-R&D safety case ~95:26, and the headline METR horizon-eval result ~101:29: Claude Mythos at ~16 hours of task horizon.
Alignment is a compounding error problem. If you're doing recursive self-improvement, your alignment may be 99.9%, but generation on generation, once you do 500 generations, now you're down to 60%.
Andrey and Jeremy cover GPT-5.5 Instant's benchmarks and cyber-risk classification, the “goblin” RL-training-leakage quirk that surfaced across GPT-5 generations, Grok 4.3 (with a narcolepsy issue), Anthropic adding Dreaming and Outcomes plus multi-agent orchestration to Claude, the Musk v. OpenAI Brockman diary, the Anthropic-SpaceX Colossus deal in detail, banks offloading AI data-center debt via SRTs (Big Short echoes), and Claude Opus 4.7 autonomously building an AlphaZero pipeline for Connect 4 [16]Last Week in AI — Last Week in AI #244 - GPT-5.5 Instant, Grok 4.3, OpenAI vs Musk.
Episode opens at ~01:11 on the Mythos PR-stunt pushback and broader AI-skepticism reckoning, then at ~10:18 dives into GPT-5.5 Instant's benchmarks and where it landed in OpenAI's cyber-risk preparedness framework. The “goblin” segment at ~15:24 is a fun detour on what happens when RL training leaks across model generations.
Grok 4.3 lands at ~23:31, with the “narcolepsy” issue documented in passing. Anthropic's Dreaming + Outcomes + multi-agent orchestration at ~35:38. The Musk v. OpenAI segment at ~41:41 covers the Brockman diary, Siobhan Zilis as conduit, and Murati testimony. The Anthropic-SpaceX Colossus deal gets a 14-minute treatment at ~51:50 — this is the same story Willison flagged from the S-1.
At ~65:02 the systemic-risk segment: banks offloading AI data-center debt via Significant Risk Transfers (SRTs), with explicit Big Short comparisons. Anthropic's Natural Language Autoencoders and unverbalized eval awareness at ~73:08. Then research-corner at ~97:26 on recursive multi-agent systems passing activations instead of text, and the closer at ~108:28: Claude Opus 4.7 autonomously building an AlphaZero pipeline for Connect 4.
GPT-5.5 release and pricing alongside the Anthropic compute gap, the GPT-5.5 regression on an internal AI-research benchmark and a related “goblin” quirk + a digression into AI consciousness, xAI Grok Voice Think Fast 1.0 and Claude creative-tool connectors, DeepSeek V4's hybrid compressed attention + 1M-token context + Huawei co-optimization, Tencent Hunyuan 3 and the ClawMark benchmark, Google's $40B Anthropic investment plus Meta-AWS Graviton, OpenAI/Microsoft restructuring, the Musk v. OpenAI trial, AISI sabotage evals, Delegate-52 document corruption, temporal sparse autoencoders, and sign-bit attacks [21]Last Week in AI — Last Week in AI #243 - GPT 5.5, DeepSeek V4, AI safety sabotage.
Pricing-and-compute opener at ~02:11, GPT-5.5 regression on internal AI-research benchmark at ~13:18 with the goblin quirk and an AI-consciousness digression. xAI Grok Voice + Claude creative connectors at ~20:23. DeepSeek V4's architecture and context window at ~26:31; Tencent Hunyuan 3 and ClawMark at ~43:40. The business-deals block at ~49:45: Google's $40B in Anthropic, Meta-AWS Graviton, China blocks Mana, OpenAI/Microsoft. Musk v. OpenAI trial + DOJ-Anthropic case + Gemini on-prem + Ineffable Intelligence funding at ~63:55. Safety research wraps the episode at ~79:07: AISI sabotage evals, Delegate-52 document corruption, temporal sparse autoencoders, and sign-bit attacks [21]Last Week in AI — Last Week in AI #243 - GPT 5.5, DeepSeek V4, AI safety sabotage.
swyx interviews Railway founder Jake Cooper on building the agent-native cloud: 3M users, 100k signups/week, 3-month payback on bare-metal data centers, and a deliberate refusal to use Kubernetes [22]Latent Space — The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway. The deeper argument is that what agents want from infrastructure — versioning, observability, feature flags at 1000x scale — reshapes what cloud platforms have to ship.
Cooper opens at ~02:00 framing Railway as “the easiest way to ship anything,” then walks the growth chart at ~07:02: free-tier era, the compaction crisis, and getting back to 100K signups/week. The infrastructure primitives discussion at ~14:04 is the bluntest part of the interview — Railway pointedly does not use Kubernetes.
At ~16:06 the economics: 3-month payback on bare-metal data centers, debt-financed cloud-burst capacity, and why owning the floor matters. The agent-native part starts at ~26:11 — versioning, observability, and feature flags must work at 1000x current scale because each user now has an army of agents pushing to prod. At ~32:14: the death of push-pull deploys, with canvas as the new output and CLI as the agent surface.
The closing third gets philosophical: at ~46:20 Cooper is openly skeptical of full AI SRE, defending the spec/code/tests trinity and the dream of self-replicating infra. At ~55:29 he closes on Heroku's slow death, a Temporal critique, and his own founder/focus philosophy [22]Latent Space — The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway.
Gergely Orosz interviews Alice Ryhl — Tokio maintainer, Google Android Rust engineer, Rust kernel contributor — on why Rust is structurally different from C++ and TypeScript, how the language is governed without a BDFL, and what AI-assisted coding does (and doesn't) change about kernel work [23]The Pragmatic Engineer — Why Rust is different, with Alice Ryhl.
Alice's path at ~02:00: Minecraft mods to Tokio to Android. The TypeScript pitch at ~07:03 — no null, the ? operator, doc tests, exhaustive matches — sets up the C++ pitch at ~13:07: memory safety eliminates a category of CVEs, period.
Ownership and the borrow checker at ~19:10: the right framing for newcomers is “rethink your data structures,” not “fight the compiler.” unsafe as escape hatch and Vec being built on top of it at ~26:11. Cargo + crates ecosystem at ~31:12, with Linus's package-manager gripe acknowledged.
Governance at ~35:15: teams, RFCs, ACPs, MCPs, and Final Comment Periods. Editions vs versions and the 6-week release cycle at ~46:24. At ~52:32: Rust in the Linux kernel — no longer experimental as of December 2025. And at ~55:35 the AI question — using LLM-assisted coding inside the Tokio repo and for kernel code review [23]The Pragmatic Engineer — Why Rust is different, with Alice Ryhl.
Nerd Snipe interviews Pete (the Claude Code/OpenClaw creator) on the screenshot that went viral: $1.3M of Codex tokens over 603 billion tokens in “Codex Bar.” The deeper interview unpacks what that money actually bought — an agent fleet doing per-commit security, meeting-listening PR bots, and continuous claw-sweeping — plus how Theo deliberately gimps his own automations to avoid prompt-injection and supply-chain risk [24]Nerd Snipe — How the OpenClaw creator uses $1.3 million of tokens.
Pete's screenshot at ~02:00: $1.3M, 603B tokens. Fast Mode + 2.5x pricing explanation at ~05:30 — what Codex Bar actually measures. The agent fleet at ~07:05: Claw Sweeper, per-commit security agents, meeting-listening PR bots.
At ~11:09 Theo explains why he caps his own automations — prompt injection + supply-chain risk make “agent does X automatically” a real attack surface. Gary Tan's $10K/month thesis and GBrain at ~14:09. Anthropic's interactive-vs-programmatic billing split + the -p flag ban at ~22:19. Mark Cuban's token-tax framing at ~40:29. Closes at ~49:35 on Hashimoto on AI psychosis and the Bun Zig-to-Rust rewrite as evidence [24]Nerd Snipe — How the OpenClaw creator uses $1.3 million of tokens.
Matt Williams and Ryan's May 19 chat ranges across whether “AI engineer” is the new DevOps, conference vector-DB tooling, a 1M-node Datadog integration graph in Obsidian, NousResearch Hermes heartbeat agents, Matt Pocock's skill library, Ollama 0.30 reverting to llama.cpp, and hardware corner (broken A7R5, M5 Max MacBook Pro) [25]Matt Williams — Matt and Ryan have a chat on May 19, 2026.
Opens at ~00:00 with “is AI engineer the new DevOps” and the AI Engineer conference vibe-check. At ~11:08 product theater and transcripts-as-conference-vector-DBs. DevOps Days Boston war story at ~15:14: $100k hotel + $60k microwave uplink.
Obsidian as second brain at ~19:20: a 1M-node Datadog integration graph + LLM-driven pruning. NousResearch Hermes heartbeats at ~29:28 — burning 30M tokens on RSS-able background tasks. Matt Pocock's skill library at ~38:34: grill-me, grill-with-docs, ubiquitous language, PowerShell vs cmd. Ollama 0.30 reverting to llama.cpp + MLX 2x speedup + Toto v2 at ~50:46. Hardware corner at ~44:40: broken A7R5, Sony A7R6 pre-order, M5 Max MacBook Pro. Closes at ~55:51 on new-machine hygiene — dotfiles, pnpm vs npm, decision logs for brew [25]Matt Williams — Matt and Ryan have a chat on May 19, 2026.
Dwarkesh Patel and geneticist David Reich on the Middle Paleolithic revolution: the standard narrative focuses on the 50–100k-year cognitive shift, but Reich argues the 3–400k-year transition to mined, transported flint cores may be the bigger evolutionary inflection [26]Dwarkesh Patel — The Stone Age Breakthrough Hiding in Plain Sight - David Reich.
At ~00:00 Reich lays out the orthodox 50k-event view; at ~02:00 he reframes the 3–400k-year shift as evidence of long-distance planning and proto-trade — humans were already moving worked flint across continents long before the symbolic explosion [26]Dwarkesh Patel — The Stone Age Breakthrough Hiding in Plain Sight - David Reich.
A short Lenny's Podcast clip arguing that “there's more change in war than there is in consumer electronics in the next 2 years.” The guest cites the daily iteration cycle on Ukrainian drone designs as proof that hardware velocity has shifted to defense, and argues for re-industrialization as a national-security imperative [27]Lenny's Podcast — The impact of war on the hardware industry.
EO short with the founder of a $22B company on how childhood ballet conditioned her for delayed-reward work: rehearsing a year for one hour on stage taught her to accept slow compounding, and a teenage 3-year goal-setting habit became the through-line [28]EO — What Ballet Taught the Founder of a $22B Company.
Andrew Ng's AI Dev 26 keynote argues software is now LEGO bricks: more building blocks, more AI assemblers, and small generalist teams shipping at 10–100x speed. The downstream effect is that PM, design, legal, marketing, and sales become the new bottlenecks — and he closes by announcing Code Dream, his conversational learning environment on Codoji [29]DeepLearningAI — AI Dev 26 x SF: Andrew Ng: The Future of Software Engineering.
Opens at ~00:07 on LEGO bricks: LLMs, RAG, agentic workflows, UI components, databases, auth — all combinable at unprecedented speed. At ~02:10 he reframes the “% AI vs human coding” debate: his work is 100% AI-written, and once humans must hand-review every line, review itself becomes the bottleneck.
The PM-bottleneck argument at ~04:11: classic 1:8 PM-to-engineer ratios collapse toward 1:1 and into a single generalist who shapes products and codes. Hiring at ~06:13: AI-native engineers must use coding agents (Claude Code, Gemini, Codex, OpenCode), know the building blocks deeply, and have generalist PM skills.
Downstream bottlenecks at ~08:15: design, legal, marketing, sales — and he pushes back on the “job apocalypse” framing. Context Hub announcement at ~11:17 for up-to-date API docs. Code Dream announcement at ~14:19: conversational learning on Codoji [29]DeepLearningAI — AI Dev 26 x SF: Andrew Ng: The Future of Software Engineering.
Paige Bailey's AI Dev 26 talk rolls through recent Google model releases and demos: AI Studio's YouTube video understanding + Compare Mode, Gemini 3.1 Flash Live (screen share, multilingual voice, camera), Gemma 4 open models (Apache 2, 2B–31B), AI Studio Build for voice-driven app creation and Lyria 3 music, Genie 3 world models with the “cat-on-jetpack” demo, and VO 3.1 video generation including a Chick-fil-A ad recreation [30]DeepLearningAI — AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI.
Roll call at ~00:07. AI Studio YouTube understanding + Compare Mode at ~03:09. Gemini 3.1 Flash Live at ~11:13: screen share + multilingual voice + camera in one session. Gemma 4 open models at ~17:24. AI Studio Build at ~20:28 with Lyria 3 music. Genie 3 at ~27:31 — the cat-on-jetpack world-model demo. VO 3.1 at ~35:45 with the Chick-fil-A ad recreation [30]DeepLearningAI — AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI.
Marc Manara from OpenAI's startup partnerships team on what OpenAI optimizes for in coding (preambles, token efficiency, long-horizon tool calling), what's still brittle (ambiguous intent, tool selection at scale), where enterprise adoption is moving fastest (legal, healthcare, vertical scale-up software), and what the next unlock is: trusting Codex on multi-hour trajectories until the human becomes the bottleneck [31]DeepLearningAI — AI Dev 26 x SF | A Fireside Chat with OpenAI's Marc Manara.
Pre-launch model tuning at ~00:07. What OpenAI optimizes for at ~03:09: preambles, tokens, long-horizon tool calling. Still brittle at ~07:11: ambiguous intent and tool selection. Next unlock at ~10:13: Codex on multi-hour trajectories, humans as bottleneck. Startup-taste argument at ~11:14: 5–10 person teams hitting tens of millions ARR. Enterprise verticals at ~17:17: scale-up software, Harvey + Legora, Abridge + Ambience. Closes at ~22:21: the abstraction layer moves up, “engineer” expands [31]DeepLearningAI — AI Dev 26 x SF | A Fireside Chat with OpenAI's Marc Manara.
Jeff Huber (Chroma) on agentic search and the new “Context 1” model: context is the underrated half of AI capability, long context windows aren't the answer (context rot), and agentic search needs both read and write paths with continuous learning at the context layer [32]DeepLearningAI — AI Dev 26 x SF | Jeff Huber: Everything You Need to Know About Agentic Search.
Chroma background at ~00:07. The thesis “AI = context + reasoning” at ~03:08. Context rot at ~07:09: long context windows don't solve the problem. Agentic search read/write paths at ~12:12. Context 1 model at ~15:14: small, fast, cheap, agentic-search-shaped. Three predictions at ~20:16: continuous context, extreme speed, continual learning at the context layer [32]DeepLearningAI — AI Dev 26 x SF | Jeff Huber: Everything You Need to Know About Agentic Search.
Adit Abraham (Reducto) on why PDF processing is still hard, the industry shift from chatbots to action-based agents, and how Reducto's agentic OCR pipeline (VLMs + speculative decoding, dynamic Markdown vs HTML output, Deep Extract) outperforms traditional CV on enterprise documents [33]DeepLearningAI — AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data.
Reducto overview + data-bottleneck framing at ~00:07. Industry shift at ~04:08: chatbots → action-based agents. PDF complexity at ~07:11: silent failures + enterprise edge cases. CV vs VLM with speculative decoding at ~10:12. Formatting for the consumer at ~13:14: dynamic Markdown vs HTML tables for RAG retrieval. Agent harnesses + Deep Extract at ~19:16 [33]DeepLearningAI — AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data.
Aditi Gupta (Redis) on building SRE agents with the Redis Context Engine: enterprise Redis is complex enough that vanilla LLMs + web search are unsafe, so the team built a multi-agent system (Knowledge, Chat, Deep Triage) with semantic caching, hybrid search, and proactive scheduling [34]DeepLearningAI — AI Dev 26 x SF | Aditi Gupta: Building SRE Agents with the Redis Context Engine.
Problem framing at ~00:07. Why vanilla LLMs are unsafe at ~02:09. Knowledge base foundation at ~05:13: chunking + metadata filtering + vector storage. Multi-agent architecture at ~09:19: Knowledge + Chat + Deep Triage with MapReduce. Model tiering + semantic caching + context-window mgmt at ~13:23. Agent memory server + hybrid search + citations + proactive scheduling at ~22:26 [34]DeepLearningAI — AI Dev 26 x SF | Aditi Gupta: Building SRE Agents with the Redis Context Engine.
Eli Schilling at AI Dev 26 builds a research-paper assistant in a live notebook on Oracle's converged database, walking through short-term + long-term memory tables, the agent loop, context engineering (token budgets, summarization, offloading), and benchmark results comparing memory-equipped vs naive agents [35]DeepLearningAI — AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database.
Why agents need memory at ~00:00. Data types + Oracle's converged DB at ~10:11. Memory architecture (short, long, agent loop) at ~17:14. Live notebook (7 memory tables) at ~22:20. Context engineering at ~37:30: token budgets, summarization, offloading. Benchmark vs naive agent at ~48:38 [35]DeepLearningAI — AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database.
Nyah Macklin frames the “AI said so” auditability problem: 95% of agent projects fail because of fractured context, and knowledge graphs — with relationships as first-class citizens — outperform tables and vectors on auditability. Cites Jiang et al. (IEEE 2026) for 37% → 91% accuracy via Graph RAG, then demos a live auditable credit decision with causal traces [36]DeepLearningAI — AI Dev 26 x SF | Nyah Macklin: The AI Said So? How to Build Auditable AI Agents Using Context Graphs.
The “AI said so” problem at ~00:07. 95% failure rate framing at ~03:10. KG vs tables vs vectors at ~06:12. Graph RAG research (Jiang et al. 2026, 37% → 91%) at ~09:14. Context graphs defined at ~12:17. Auditable credit decision demo at ~17:25 [36]DeepLearningAI — AI Dev 26 x SF | Nyah Macklin: The AI Said So? How to Build Auditable AI Agents Using Context Graphs.
Pratik Verma's Okahu observability talk: agents fail on edge cases more than logic errors, so the right primitive is agentic tracing (Project Monocle, open-source) feeding a knowledge graph + LLM-as-judge evals, with CI/CD integration that auto-creates issues and runs Claude fix loops [37]DeepLearningAI — AI Dev 26 x SF | Pratik Verma: Observability Agent to Find & Fix Issues in AI Agents.
Why agents fail at ~00:07: edge cases, not logic. Project Monocle at ~02:07: open-source agentic tracing. Okahu observability agent + KG at ~04:07. Silent failures + LLM-as-judge at ~06:07. CI/CD + Claude fix loop at ~08:08. Full loop at ~11:12 [37]DeepLearningAI — AI Dev 26 x SF | Pratik Verma: Observability Agent to Find & Fix Issues in AI Agents.
Eda Zhou and Mahdi Ghodsi walk through building personal AI agents with open-source models — the LLM-vs-agent gap, the three components (model + runtime + tools), deploying Qwen 3.5 20B on AMD GPUs with vLLM, Open Claude onboarding with personality files, a live code-debug demo with reusable skills, and a multi-agent morning-briefing workflow [38]DeepLearningAI — AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models.
LLM vs agent at ~02:08. Three components at ~04:10. Qwen 3.5 20B on AMD + vLLM at ~07:12. Open Claude onboarding + personality at ~11:17. Live demo at ~18:24. Multi-agent morning briefing at ~27:36 [38]DeepLearningAI — AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models.
William Imoh and Charlie Wood on closing the “care gap”: the chart-prep burden and readmission risk for high-risk patients, why rules-based EMR systems fall short, and why a local vector DB + four-agent pipeline (Context, Risk, Protocols, Brief) running on-prem can ship a pre-visit brief for a high-risk patient [39]DeepLearningAI — AI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap.
The care gap at ~00:07. Rules-based EMR limits at ~02:08. Architecture (hybrid queries + HNSW + on-prem embedding) at ~05:11. Four-agent pipeline at ~11:14. Live demo at ~21:27. Edge RAG takeaways at ~29:35 [39]DeepLearningAI — AI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap.
Jean-Marie John-Mathews on systematic red-teaming for LLM apps: the Chipotle chatbot case, a taxonomy of intentional attacks vs legitimate-use failures, why LLM-as-judge falls short for agentic systems, multi-turn failure examples (frustrated customers, hidden tool-call errors), and a demo of Just Catch — an open-source skill that converts natural-language requirements into a test suite [40]DeepLearningAI — AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically.
Chipotle chatbot case at ~00:07. LLM risk taxonomy at ~01:07. Why LLM-as-judge falls short at ~03:09. Multi-turn failures at ~05:11. Just Catch open-source skill at ~08:13. Live demo at ~10:15 [40]DeepLearningAI — AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically.
Patrick Löber (Google DeepMind) walks the “any-to-any” multimodal architecture: phase-1 multimodal understanding (PDFs, video, audio), phase-2 agentic loop with function calling for multimodal generation, why native generation gives access to LM world-knowledge, the Live API's single audio-to-audio architecture, and new multimodal embeddings + Gemma 4 local agents [10]AI Engineer — Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind.
What any-to-any means at ~00:17. Phase 1 multimodal understanding at ~03:20. Phase 2 agentic loop at ~07:24. Why native generation matters at ~11:25. Live API single architecture at ~13:28. Multimodal embeddings + Gemma 4 at ~15:00 [10]AI Engineer — Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind.
Marc Klingen (Clickhouse/Langfuse) on skilling-up coding agents: skills are Rubik's-cube manuals (reliable shortcuts), the problem is stale pre-training context + 478 pages of docs, six concrete learnings (traces, CLI help flags, agent sitemaps, RAG search), basic eval setup with LLM-as-judge over filesystem state, auto-research where agents improve their own skills, and the open problems of skill versioning and distribution [41]AI Engineer — Skill issue: Lessons from skilling up coding agents to use Langfuse - Marc Klingen, Clickhouse.
Rubik's-cube framing at ~00:15. The problem at ~03:18: stale pre-training + 478 pages of docs. Six learnings at ~05:19: traces, CLI help, sitemaps, RAG search. Basic eval setup at ~12:24. Auto-research at ~14:24. Open problems at ~17:24: versioning, distribution, target-definition [41]AI Engineer — Skill issue: Lessons from skilling up coding agents to use Langfuse - Marc Klingen, Clickhouse.
Cormac Brick (Google AI Edge) on fine-tuning tiny LLMs for on-device agents: the two on-device GenAI patterns (system-level Gemini Nano vs app-level LiteRT/LiteLM), Function Gemma's 270M-parameter robust function calling, and how a synthetic-data fine-tuning workflow took function-calling accuracy from 46% to 90%+ [42]AI Engineer — From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google.
AI Edge stack at ~00:15: system vs app GenAI. LiteRT runtime at ~01:15: 2.7B devices, CPU/GPU/NPU. Agent skills demo at ~06:19: AI Edge Gallery app on Gemma 4. LiteLM runtime + export at ~10:20. Function Gemma at ~13:24: 270M-parameter robust function calling. Fine-tuning workflow (46% → 90%+) at ~14:24 [42]AI Engineer — From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google.
AI Daily Brief distills Jason Lou's “Codex maxing” post into nine practical tips: long-running mono-threads, voice + “the art of the ramble,” mid-run steering, structured memory in an Obsidian vault, computer/browser tools, remote control + mobile Codex, heartbeat check-ins, /goal for verifiable success criteria, and the side panel as workspace [43]The AI Daily Brief: Artificial Intelligence News — 9 Codex Tips from the Codex Team.
Tip 1 mono-threads at ~12:08. Tip 2 voice ramble at ~14:10. Tip 3 mid-run steering at ~15:11. Tip 4 Obsidian memory at ~16:12. Tip 5 tools (computer/browser/connectors) at ~19:12. Tip 6 mobile Codex at ~20:14. Tip 7 heartbeats at ~21:14. Tip 8 /goal at ~22:15. Tip 9 side panel at ~23:16 [43]The AI Daily Brief: Artificial Intelligence News — 9 Codex Tips from the Codex Team.
marimo highlights Kimi K2.6 as a speed milestone: a 100-line rhyming poem in ~9.2 seconds at ~120 tokens/sec on the Weights & Biases inference engine running on CoreWeave. The takeaway is that the “slow LLM” era for medium-context generation is ending [44]marimo — Kimi K2.6 - The End of Slow LLMs.
Two quick tool demos from today's feed. Better Stack profiles Understand-Anything, an open-source tool that turns any codebase into a queryable knowledge graph using static analysis + multi-agent LLM processing — 14k+ GitHub stars, positioned as the “codebase MRI” before refactoring something you didn't author [45]Better Stack — This AI Tool Maps Any Codebase Before You Touch It (Understand-Anything). marimo separately demos a reactive graph widget where selections in the visual graph are reactive to Python — hover state, multi-select, and programmatic node additions all flow bidirectionally between Python state and the rendered graph [46]marimo — Wayyy Better Graphs.
Artem Zhutov argues Markdown becomes a limiting format as notes grow in complexity, and proposes using AI-generated HTML artifacts inside Obsidian instead — unlocking interactive dashboards, dynamic tables, and richer documents while preserving Obsidian's local-first workflow [47]Artem Zhutov — Stop Writing Markdown in Obsidian. Do This Instead.
Two short productivity clips that rhyme. Real Python: build personal LLM skills the way you build libraries, so you stop re-teaching the model the same workflow on every new project — a framing that dovetails with Marc Klingen's AI Engineer talk on skills-as-Rubik's-cube manuals [48]Real Python — Build Custom LLM Skills to Save Hours of Work. Sequoia with Jake Stauch (Serval): the most common failure mode for enterprise automation is that building the automation isn't easier than just doing the manual task — the “skill-as-library” framing is one of the few moves that flips that inequality [49]Sequoia Capital — The simple test most automation platforms fail | Jake Stauch, Serval.
Arjay McCandless's system-design walkthrough on building a Google Drive-style upload service. Key insight: skip your API/server entirely on the file-upload path and have clients upload directly to S3 via pre-signed URLs, then notify your service to update metadata [50]Arjay McCandless — System Design: Google Drive.
OpenAI customer story with Abridge engineering manager Matt Sanders. GPT-5.5 noticeably improves fact extraction from doctor-patient conversations, particularly when the same topic resurfaces multiple times at varying depths during a visit — a long-standing failure mode for clinical documentation models [51]OpenAI — Built with GPT-5.5: Abridge Clinical AI Notes.
Two Nate B Jones shorts on the same axis. First: three-quarters of teenagers now use AI companions for emotional support, in some cases as their primary source of connection — his argument being that a chatbot “can't model empathy because it doesn't have anything to lose” [52]AI News & Strategy Daily | Nate B Jones — How ChatGPT Became Teenagers' Best Friend. Second: 2 billion kids attend schools designed for a 20th-century industrial economy, while Nature has published a peer-reviewed argument that AGI has arrived and 86% of students globally now use AI in coursework — the curriculum gap is a now-problem, not a 5-year problem [53]AI News & Strategy Daily | Nate B Jones — The calculator moment nobody's talking about in education.
Short Last Week in AI clip: society shouldn't be “shocked” by Mythos-style AI-PR-stunt cyber events anymore, and the next inflection — a bio-weapon-shaped incident — is foreseeable. Predictions in the segment are deliberately not hand-waved [54]Last Week in AI — The "bio-weapon version" of Mythos. Reads as a companion piece to Nate B Jones's argument above that society is consistently a step behind the actual deployment curve.