Anthropic snags Stainless; OpenAI ships Codex to Dell

May 18, 2026

40 topics · 30 sources

Industry

Anthropic

Anthropic Acquires Stainless to Strengthen SDK and Agent Connectivity

Anthropic acquired Stainless on May 18, 2026, a company that has built and maintained every official Anthropic SDK since the API launched. The acquisition is aimed at advancing agent connectivity and developer experience around MCP. ^{[1]Anthropic — Anthropic acquires Stainless}

Anthropic announced the acquisition of Stainless, an SDK and MCP server tooling company founded in 2022. Stainless generates software development kits, command-line tools, and connectors across multiple programming languages including TypeScript, Python, Go, Java, and Kotlin, and has powered every official Anthropic SDK since the API's inception. The company serves hundreds of organizations.

The strategic rationale centers on making Claude's agents more capable by expanding what they can connect to. Katelyn Lesse, Head of Platform Engineering at Anthropic, stated: "Agents are only as useful as what they can connect to." By joining forces, the combined team aims to push the frontier of developer experience and agent connectivity, leveraging MCP — the Model Context Protocol created by Anthropic — to enable broader ecosystem integration for Claude users and AI agents. Alex Rattray, Founder and CEO of Stainless, is joining Anthropic as part of the deal.

Agents are only as useful as what they can connect to. — Katelyn Lesse, Head of Platform Engineering, Anthropic

Tools: Claude, MCP, Model Context Protocol, Anthropic SDK

Industry

OpenAI

OpenAI and Dell Partner to Deploy Codex in Hybrid and On-Premises Environments

OpenAI and Dell Technologies are collaborating to bring Codex into enterprise hybrid and on-premises environments through the Dell AI Data Platform and Dell AI Factory. The goal is to let enterprises run Codex agents closer to their internal data, systems, and workflows. ^{[2]OpenAI — OpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments}

OpenAI and Dell Technologies announced a partnership to make Codex deployable within the hybrid and on-premises infrastructure that enterprises already use. Codex is described as one of OpenAI's fastest-growing enterprise products, with over 4 million developers using it weekly. The collaboration centers on connecting Codex to the Dell AI Data Platform, which businesses use to store, organize, and govern on-premises data, enabling agents to leverage internal context such as codebases, documentation, business systems, and team workflows.

The two companies will also explore integrating Codex with the Dell AI Factory — Dell's platform for powering AI workloads — along with ChatGPT Enterprise and other API-based solutions. Use cases extend beyond software development to knowledge work tasks like gathering context across tools, preparing reports, routing product feedback, qualifying leads, writing follow-ups, and coordinating work across business systems. Dell's SVP and CTO Ihab Tarazi noted that the partnership gives customers 'a practical, secure path to deploying AI agents at scale' within their own premises.

{'text': "Collaborating with OpenAI brings together Dell's industry-leading enterprise grade infrastructure with cutting edge agentic AI harnesses and models from OpenAI. The Dell AI Factory with OpenAI Codex will allow enterprises to deploy AI where enterprise data already lives, within their premises, giving customers a practical, secure path to deploying AI agents at scale.", 'speaker': 'Ihab Tarazi, SVP and CTO, Infrastructure Solutions Group, Dell Technologies'}

Tools: Codex, Dell AI Data Platform, Dell AI Factory, ChatGPT Enterprise

AI Future

Import AI

AI Stuxnet: Historic Software Sabotage Targeting Scientific Calculations

Researchers uncovered fast16.sys, a 20+ year-old malware that silently corrupted high-precision calculations in engineering and weapons-modeling software, predating Stuxnet by five years. ^{[3]Import AI — Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment}

SentinelOne researchers analyzed fast16.sys, a sophisticated virus targeting precision calculation tools used in civil engineering, physics, and weapons research. The malware injected floating-point unit instructions to introduce subtle, systematic errors into calculations—making detection extremely difficult. Primary targets included LS-DYNA (used in nuclear weapons modeling), PKPM, and the MOHID hydrodynamic modeling platform. By propagating these inaccuracies across an entire facility's machines, the attackers could degrade scientific output without triggering obvious alarms. This discovery raises urgent questions about AI-era equivalents: adversaries could use similar principles to corrupt AI training pipelines or evaluation benchmarks.

By combining this payload with self-propagation mechanisms, the attackers aim to produce equivalent inaccurate calculations across an entire facility.

Tools: LS-DYNA 970, PKPM, MOHID

AI Models

Import AI

Aurora Optimizer Fixes Neuron Death Problem in Muon

Tilde Research found the popular Muon optimizer kills over 25% of neurons during training and developed Aurora, a leverage-aware replacement that achieves lower loss and higher MMLU scores. ^{[3]Import AI — Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment}

Researchers at Tilde Research identified a critical flaw in the Muon optimizer: its leverage-based updates cause permanent neuron deactivation in MLP layers, with more than one in four neurons effectively dead by training step 500. This bimodal distribution of leverage scores degrades model quality and downstream task performance. In response, the team developed Aurora, an optimizer designed to handle rectangular matrices in a leverage-aware way. Head-to-head testing on 1.1B-parameter transformers showed Aurora achieving a lower final loss (2.26 vs. Muon's 2.31) and improving MMLU scores by 10 points, suggesting it handles memorization-intensive tasks significantly better.

By step 500, more than one in four neurons are effectively dead, producing a sharply bimodal distribution of leverage scores.

Tools: Muon, NorMuon, Aurora, 1.1B-parameter transformer

AI Future

Import AI

Positive Alignment: Designing AI to Support Human Flourishing

A multi-institution position paper argues that AI safety research over-indexes on preventing harm and proposes 'positive alignment'—building systems that actively support human thriving. ^{[3]Import AI — Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment}

Researchers from Oxford, Google DeepMind, OpenAI, Anthropic, and other institutions argue that conventional safety frameworks focus on failure prevention while neglecting the opportunity to help humans genuinely flourish. Positive alignment reframes the goal: rather than just avoiding bad outcomes, AI should be designed to support human aims in context-sensitive, user-authored ways. The authors warn that purely defensive stances risk producing superficially compliant but mediocre or sycophantic systems. The framework calls for decentralized, diverse governance structures so communities can define flourishing on their own terms rather than having it imposed from above.

A model can satisfy all safety constraints while being mediocre, sycophantic, or unhelpful.

AI Models

Import AI

LLMs as Autonomous AI Researchers: Strong at Search, Weak on Novel Ideas

Prime Intellect benchmarked GPT-5.5 and Claude Opus 4.7 on the nanoGPT speedrun challenge, finding agents excel at hyperparameter search and method stacking but struggle to generate genuinely new research insights. ^{[3]Import AI — Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment}

Prime Intellect ran approximately 10,000 training runs across 14,000 H200 GPU-hours, tasking frontier LLMs with optimizing the nanoGPT speedrun challenge. Both GPT-5.5 and Claude Opus 4.7 outperformed human baselines on hyperparameter sweeps and combining known techniques, but consistently failed to produce original ideas. The agents tended to accumulate components rather than prune them, relying on exhaustive search rather than elegant insight. This suggests contemporary LLMs are effective 'hillclimbers' in research settings but remain fundamentally dependent on the human-generated discoveries they were trained on.

Agents are very good at optimizer search, hyperparameter sweeps, and stacking methods together, but they struggle to come up with new ideas on their own.

Tools: GPT-5.5, Claude Opus 4.7, nanoGPT, PE-8

Industry

Tech Brew

SpaceX accelerates IPO timeline targeting record valuation

SpaceX is pushing up its IPO to as early as June 12, 2026, with a ~$1.75 trillion valuation that would make it the largest IPO in history, driven by a faster-than-expected SEC review. ^{[4]Tech Brew — SpaceX speeds up its IPO}

SpaceX is accelerating its initial public offering and targeting a Nasdaq listing as early as June 12, 2026, with a prospectus filing potentially imminent. The company is eyeing a valuation of approximately $1.75 trillion, which would position it as the largest IPO in history. The accelerated timeline was triggered by an unexpectedly swift SEC review. The article, written by Lindsey Choo and published May 18, 2026, notes that SpaceX represents the profitable division of the broader SpaceXAI merger that completed in February 2026.

The IPO faces several headwinds. A critical Starship test flight is scheduled ahead of the listing — the redesigned megarocket is central to NASA's 2028 moon mission. The merged SpaceXAI entity has also seen significant talent attrition, with over 50 senior researchers and engineers departing since the February merger, and all xAI co-founders except Elon Musk having left. IPO filings reveal a control structure that prevents Musk from being removed as CEO and board chairman without his own consent. CNBC's Jim Cramer warned that share scarcity could cause SpaceX to "create a bubble unto its own."

SpaceX would create a bubble unto its own

Industry

Sherwood Snacks

Tesla Robotaxi Crash Records Unredacted

Tesla unredacted its NHTSA filings revealing 17 Robotaxi crashes in Austin between July 2025 and March 2026, with the system or its teleoperators at fault in roughly seven incidents — and Elon Musk had falsely claimed no accidents occurred. ^{[5]Sherwood Snacks — Details of 17 Tesla Robotaxi crashes revealed}

Tesla filed unredacted crash narratives with the National Highway Traffic Safety Administration covering 17 incidents involving its Robotaxi service in Austin, Texas from July 2025 through March 2026. All vehicles were 2026 Model Y units operating with a human safety monitor in the front seat. Of the 17 crashes, 13 resulted in property damage only, two produced no injuries, one caused a minor injury without hospitalization, and one required a hospital visit after an SUV rear-ended the Robotaxi at low speed in a slip lane. The fault breakdown shows roughly seven crashes attributable to Tesla's autonomous driving system or its remote operators: two incidents occurred after teleoperators took over control and then drove the car into a metal fence at 8 mph (July 2025) and a construction barricade at 9 mph (January 2026); others involved the vehicle striking a metal chain while completing an unprotected left turn, contacting a wooden electrical pole while reversing, and bumping a dump trailer hitch. The remaining incidents were caused by other road users — rear-endings at traffic lights and stop signs, a bus sideswipe, and a pedicab clipping a mirror.

The disclosure is notable because Tesla had previously redacted every narrative section in its NHTSA filings, citing "confidential business information," making it impossible to assess what was actually happening in each crash. The unredacted records also include a September 2025 incident where a dog ran into the intersection; the ADS slowed and steered left but the dog made contact with the front right bumper before running away. The transparency comes after CEO Elon Musk stated on Tesla's April 2026 earnings call that no Robotaxi accidents had occurred — a claim directly contradicted by the company's own regulatory filings. Critics noted that if the system cannot handle edge cases and teleoperators are themselves causing crashes during interventions, it raises serious questions about the service's readiness for broader expansion into Houston and Dallas.

If Tesla can't trust its ADS to navigate tricky situations and its teleoperators are crashing the cars when they intervene, that's a real problem.

Productivity

Data Science Weekly

Why Averages Can Mislead

When data is skewed, the mean may not represent the typical value — alternative measures like the median or mode can give a more accurate picture of reality. ^{[6]Data Science Weekly — Monday Statistics: Why Averages Can Mislead}

This article examines a common statistical pitfall: relying on the arithmetic mean when data distributions are skewed. In right-skewed distributions (e.g., income, house prices, or response times), a small number of extreme high values pull the mean far above what most data points look like, making the 'average' an unrepresentative summary statistic.

The piece argues that practitioners should default to the median — or consider the full distribution — before reporting or acting on averages. Understanding skewness and choosing the right central tendency measure is a foundational skill for anyone working with real-world data.

When your data is skewed, the 'average' may not represent reality.

Podcast

AI Engineer

Ash Prabaker & Andrew Wilson at AI Engineer: Building Agents That Run for Hours

Anthropic Applied AI engineers Andrew Wilson and Ash Prabaker walk through how Claude's coding agents went from 20-minute runs on Sonnet 3.7 to 30+ hour autonomous builds on Opus 4.6, then unpack the GAN-inspired planner/generator/evaluator harness they use to one-shot full-stack apps, including hard-won lessons on rubrics, context strategy, and trace reading. ^{[7]AI Engineer — Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic}

Andrew Wilson opens with a year-long history tour of how Claude's coding harness co-evolved with model releases ~01:14. He frames three reasons long-running agents are hard: finite context windows that cause 'context rot' and 'context end anxiety,' weak planning (one-shotting everything or stopping half-done), and the model's inability to honestly judge its own output ~02:15–~04:16. Anthropic attacks the problem on two axes — baking capabilities into model weights (the METR chart showing Opus 3.7 at ~1hr to Opus 4.6 at ~12hr on a minimal scaffold) and improving the harness via the Agent SDK ~04:16–~05:16. The release timeline covers Sonnet 3.5 artifacts, computer use, MCP, Claude Code research preview with Sonnet 3.7, the Ralph Loop popularized by Jeffrey Huntley (July 2025) and Anthropic's in-session variant using stop hooks ~07:18–~09:21, Sonnet 4.5 with checkpoints and the rename from Claude Code SDK to Agent SDK ~09:21–~10:22, then the Haiku 4.5 / Opus 4.5 family that made sub-agents economical and planning strong enough to use Opus 4.5 as planner + Sonnet 4.5 as workhorse, plus Skills with progressive disclosure and programmatic tool calling ~10:22–~11:22. Andrew walks through Anthropic's first long-running-agents blog post (Nov 2025): an initializer breaks a vague prompt into a featurelist.json (JSON over markdown because models overwrite markdown more readily), a progress file, an init script, then a loop that picks one feature in a fresh context window, smoke-tests, implements, verifies via Puppeteer, commits, and repeats ~12:23–~13:24. With Opus 4.6 / Sonnet 4.6, agent teams (sub-agents that talk to each other, not just back to main), server-side compaction, and 1M-context GA changed the calculus — single long sessions now beat fresh-session juggling for many workloads ~13:24–~16:28.

Ash takes over at ~17:28 to cover state-of-the-art harness patterns. The core idea is GAN-inspired: a generator builds, a discriminator/evaluator grades using real tools (Playwright opening live pages, clicking, screenshotting), and adversarial pressure between separate context windows produces better results than asking one Claude Code session to review itself ~18:30–~19:31. The asymmetry Anthropic exploits: tuning a standalone critic to be harsh is tractable; tuning a builder to be genuinely self-critical is not — same as humans being better critics than creators ~19:31–~20:31. To grade subjective qualities like front-end taste, they write a four-criterion rubric — design, originality, craft, functionality — weighted toward design and originality (since Opus 4.6 already nails functionality), then calibrate with few-shot reference sites to fight 'purple gradient AI slop' ~20:31–~22:32. A key behavior of the harness is that it pivots: if originality keeps scoring low, the GAN throws everything out and restarts from scratch, whereas single-pass or Ralph-loop setups keep patching the same broken thing ~22:32–~23:33. They add a planner that breaks a one-line prompt into high-level sprints but deliberately avoids granular technical detail (errors there cascade across multi-hour horizons) — essentially a PM/IC/QA org structure with separate context windows ~23:33–~24:34. The crucial glue: before the generator writes a line, generator and evaluator negotiate a 'contract' of what done means via markdown files on disk, iterating until both agree, then the evaluator grades against that contract rather than the planner's original spec — bridging vague user stories into testable assertions ~24:34–~26:36.

Ash demos a 'build a retro game maker' prompt: the solo loop produced a pretty-looking sprite editor where arrow keys did nothing in play mode ~26:36–~28:38; the harness run (~$200, ~6 hours, same model) produced 'Retro Forge' with a 54-color palette, a recursive AI-level-assistant feature the planner inferred from a vague spec line, a live debug HUD for the evaluator's benefit, and working physics — purely because the evaluator actually played the game ~28:38–~31:39. The contract had 27 criteria, which Ash flags as the granularity needed for actionable critiques ~31:39–~32:40. He's candid that Claude is a bad QA agent out of the box — same sycophancy/generosity bias as LLM-as-judge generally — so they spent enormous time tuning prompts by reading traces by hand, finding where the model's judgment diverged from theirs, and adjusting; a useful tooling tip is piping transcripts into files and grepping them with another agent ~32:40–~34:40. On adapting harnesses as models improve ~34:40–~36:41: with Opus 4.6 they dropped context resetting between sessions (Opus 4.5 had bad context-end anxiety, 4.6 doesn't), dropped sprint decomposition (4.6 holds 2-hour continuous coherent builds), and moved the evaluator from every-sprint to end-of-one-shot. Final setup is just planner + generator + evaluator + filesystem state, costing roughly half the previous runs; a DAW music app demo shows the simplified harness still working ~35:40–~37:41. Closing takeaways at ~38:43–~39:45: self-evaluation is a trap (use adversarial), compaction ≠ coherence (lossy summaries drift), structured handoffs + clean contexts win, subjective quality is gradable if you write down strong opinions, and read the traces — that's the only way to know what scaffolding to delete as the frontier moves. He highlights primitives anyone can use today: auto mode (safer than --dangerously-skip-permissions), custom sub-agents, Playwright MCP / Claude for Chrome MCP, and skills as a way to package grading rubrics.

Q&A surfaces several practical points. On evaluator improvement ~39:45–~41:46: tuning targets common model weak points (e.g., design taste) so it generalizes across projects, not per-app secret sauce. On Ralph loop vs. long sessions ~41:46–~43:47: with 1M context GA and 4.6's coherence, single long sessions with compaction now work for the generator/evaluator pattern, but Ralph still has a place depending on use case — Ash treats it as a temporary patch for today's context rot. On Playwright ~43:47–~45:48: Playwright MCP or Claude for Chrome MCP; vision on 4.x is now good enough to detect overlapping text and read console/network errors reliably. On stopping behavior ~45:48–~48:50: surprisingly, 4.6 models cheerfully throw away 10 passes and restart when they can't hill-climb the rubric, so human-in-the-loop wasn't needed as often as expected. On planner intervention ~48:50–~50:51: the planner's spec is reinjected into sessions as a reference, but the planner is deliberately kept out of the inner loop — its job is just outer guardrails; the generator/evaluator contract pattern generalizes to multi-stage workflows (e.g., synthetic data generation pipelines). On model selection ~51:52–~53:54: harness design is informed by model — Opus 4.6 for planning + Sonnet 4.6 for execution is a common cost-tuned configuration. On long-lived products ~53:54–~56:55: filesystem state is the default; embed prompting that tells the harness to write breadcrumbs to JSON (tried this → bug → fix → worked) plus a live high-level docs file so humans and future Claude Code sessions can pick up. On agent teams vs. generator/critic ~56:55–~59:56: Anthropic doesn't have a strongly opinionated stance — generator/evaluator is a subset of agent-teams thinking and the two compose (each sub-agent in a team can have a paired critic). On giving the critic the generator's context ~59:56–~61:59: they tried it and stopped — it muddies thinking; the critic should judge only the output, not the trace, otherwise both sides convince themselves things work. On traceability ~61:59–~63:00: there is no magic — read traces by hand, optionally use Claude with custom prompts as a first-pass filter. On measuring harness quality ~63:00–~65:01: write extremely detailed rubrics at generator and evaluator levels (taste, API design, code quality), then measure where the model started vs. ended on each criterion within a run; this is opinionated and greenfield-flavored, not directly portable to brownfield. On team collaboration ~65:01–~67:04: not well-solved yet — bottoms-up adoption with one owner maintaining a composable harness; collaboration mostly happens via SCM hygiene, PRs, and git worktrees. On scrum-style human review ~67:04–~70:05: the goal is to remove humans from the loop, not bake them in — if you need intervention, use hooks for stop conditions, but Anthropic prefers iterating prompts after autonomous runs over inserting humans mid-run. On brownfield use ~70:05–~72:06: the pattern is mostly greenfield-suited; brownfield benefits more from full SDLC automation (monitoring → issue → PR → review). On 'reading traces' ~73:06–~75:07: it's literally reading the raw output line by line — Ash describes a Claude for Chrome empathy exercise where the team browsed the web with their eyes mostly closed, opening for a 10-second snapshot at a time, to internalize how the model sees the page; the learnings flow back into prompt templates, CLAUDE.md, skills, and Claude Code's new auto-memory.

Sections

~00:14 — Intros and why long-running agents are hard
~04:16 — Two levers: model weights (METR chart) vs. harness (Agent SDK primitives)
~06:17 — Year-long release history: Sonnet 3.5 artifacts to Sonnet 3.7 + Claude Code research preview
~07:18 — Ralph Loop, Anthropic's stop-hook variant, and Sonnet 4.5 / Claude Code 2.0 checkpoints
~10:22 — Haiku/Opus 4.5, Skills with progressive disclosure, and the first long-running-agents blog post (initializer + featurelist.json)
~13:24 — Opus/Sonnet 4.6: agent teams, server-side compaction, 1M context GA
~17:28 — GAN-style generator/evaluator: separate contexts, adversarial pressure, exploiting critic/builder asymmetry
~20:31 — Grading taste with a four-criterion rubric (design, originality, craft, functionality)
~23:33 — Adding a planner; generator/evaluator negotiate a contract before any code is written
~26:36 — Retro Forge demo: solo loop vs. harness ($200, 6hr, 27 contract criteria)
~32:40 — Claude is a bad QA agent out of the box — debug by reading traces, not running more experiments
~34:40 — Adapting harness as models improve: dropping context resets and sprint decomposition for Opus 4.6
~38:43 — Takeaways and Q&A on Playwright, model selection, brownfield use, traceability, and trace-reading empathy

{'timestamp': '08:20', 'text': "Deterministically bad in an undeterministic world... it's better to fail predictably than it is to succeed unpredictably.", 'speaker': 'Andrew Wilson (paraphrasing the Ralph Loop ethos)'}

{'timestamp': '16:28', 'text': "The harness doesn't just disappear as the models get better. It's really evolving as the models change over time — finding the gaps in the model and then filling that in with the harness, and then you train the model on that aspect of the harness, and maybe at some point you actually remove that entirely.", 'speaker': 'Andrew Wilson'}

{'timestamp': '19:31', 'text': 'Tuning a standalone critic to be harsh is actually very tractable, but tuning a builder to be somewhat self-critical is not.', 'speaker': 'Ash Prabaker'}

{'timestamp': '22:32', 'text': "Most people say you can't grade taste, but we think you can if you have a strong enough opinion on it and you just kind of write it down.", 'speaker': 'Ash Prabaker'}

{'timestamp': '25:34', 'text': 'Before the generator actually goes ahead and writes a single line, we have the two agents basically negotiate what done actually means.', 'speaker': 'Ash Prabaker'}

{'timestamp': '32:40', 'text': 'The primary debugging loop was this — reading what the agent actually did, finding where its judgment diverged from ours as humans, and then tuning the prompt for that. It was the same kind of muscle as reading a stack trace.', 'speaker': 'Ash Prabaker'}

{'timestamp': '38:43', 'text': 'Self-evaluation, very much a trap. Just use an adversarial evaluator.', 'speaker': 'Ash Prabaker'}

{'timestamp': '59:56', 'text': "It's very easy for the model to kid itself that something is working — and that feeds into the evaluator as well.", 'speaker': "Ash Prabaker (on why the critic should not see the generator's traces)"}

{'timestamp': '73:06', 'text': "It's a really important skill when building agents in general — to empathize as much with the model... spend as much time with these models, reading through line by line, being like, oh, why did it think this?", 'speaker': 'Ash Prabaker'}

Tools: Claude Code, Claude Code SDK / Agent SDK, Claude Sonnet 3.5 / 3.7 / 4 / 4.5 / 4.6, Claude Opus 3.7 / 4 / 4.5 / 4.6, Claude Haiku 4.5, MCP (Model Context Protocol), Playwright MCP, Claude for Chrome MCP, Puppeteer, Computer use, Skills (progressive disclosure), Programmatic tool calling, Agent teams, Sub-agents, Server-side compaction, Checkpoints (Claude Code 2.0), Hooks / stop hooks, Auto mode (vs. --dangerously-skip-permissions), Auto memory, CLAUDE.md, Git worktrees, Ralph Loop (Jeffrey Huntley), METR long-task benchmark, SWE-bench

Podcast

AI Engineer

Guillaume Vernade at AI Engineer: Going Bananas with Google's GenMedia Stack

Google DeepMind DevRel Guillaume Vernade walks through Google's generative media stack — Nano Banana 2, Veo 3.1, Lyria, and Gemini TTS — by building a live Colab that illustrates an entire public-domain book end to end with images, video, music, and multi-voice narration. ^{[8]AI Engineer — Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind}

Guillaume Vernade opens by introducing himself as a developer advocate at Google DeepMind, formerly a Stadia producer, whose job is to make sure new model releases ship with usable docs, samples, and SDKs — and to push back internally when each model team invents its own API surface ~00:14. He frames the talk around Google's gen media lineup and DeepMind's longer-term bet on world models that ingest and emit all modalities, even though for shipping reasons they release specialized models (image, video, music, robotics, agents, Gemma 4, research models like AlphaEvolve and AlphaGenome) ~03:19. He notes DeepMind ships something roughly every five days on average ~06:23.

He then runs through recent releases: Nano Banana 2 (Gemini 3 Flash Image Preview) with new aspect ratios up to 4K, search grounding, and image grounding for better visual references; Veo 3.1 and the new cheaper Veo 3.1 Fast/Light at ~5 cents per second; and Lyria, the new music model that produces 30-second clips or full 3-minute songs. He calls out Lyria RealTime as his personal favorite — a predictive (not diffusion) model that streams music continuously and DJ-mixes between prompts in ~2 seconds ~07:23.

The core of the session is a live cookbook Colab (goo.gle/cookbook-illustration) that illustrates Kenneth Grahame's *The Wind in the Willows* using the full stack ~11:26. Guillaume sets up the GenAI SDK with automatic retries, explains the three Google surfaces — consumer Gemini apps, AI Studio / Gemini Developer API, and Vertex AI — and uses Gemini chat mode with structured JSON output to generate character prompts, then Nano Banana 2 to render the characters in a 'colorful building block style' ~15:31. He demonstrates the new priority service tier (2x price, fast lane) alongside standard and flex (50% discount, delayed) tiers, shipped the day before ~19:35.

For chapter illustrations he switches from chat mode to a smarter pattern: ask Gemini to list which characters appear in each chapter, then pass only those character reference images into a unary generate_content call, giving Nano Banana 2 tighter context ~29:46. He also previews the new stateful Interactions API which stores conversation state server-side (~2 days), auto-caches context, and makes forking conversations cheap — likely the default API at I/O ~27:44.

For video, he feeds the last chapter image to Veo 3.1 as a first frame and shows that reusing the image prompt produces a decent but slightly off result (wrong character speaks); the fix is asking Gemini to write a dedicated Veo prompt describing 'what happens in the next few seconds' ~34:53. He notes that Gemini itself is heavily used to generate training prompts for the gen media models, which is why Gemini-written prompts work so well, and that all gen media models internally rewrite short prompts anyway ~41:01.

For music, Lyria Clip generates per-chapter instrumental songs (~4 cents) entirely controlled by prompt — BPM, scale, instruments, intro/chorus/outro structure, even inline lyrics with timestamps returned in the output (useful for karaoke apps) ~42:02. He demonstrates lyric generation and notes that chat-mode memory anchors the songs to a similar theme.

The TTS section shows a clever trick: using Gemini's two-voice TTS to fake multi-character dialogue by assigning one voice to the narrator and another to all characters, but instructing each character to speak in a different style (whispering, breathless, stuttering, accents) so they sound distinct even with a shared voice ~47:16. He warns you must prefix the input with 'read this text' or the model won't speak it.

He closes with a tour of Lyria RealTime in AI Studio — the live DJ-style model that crossfades between prompts — pitching it for video game adaptive soundtracks, and a 'Space DJ' demo where each planet is a music prompt and proximity controls the mix ~55:20. Q&A covers Europe availability (short answer: no, because preview models are global-endpoint only on Google Cloud, but he's pushing internally to get models to GA faster) ~58:23 and the difficulty of using darker books like *Frankenstein* due to safety filters ~60:24. He wraps by encouraging attendees to explore the DeepMind cookbook GitHub repo for quickstarts and end-to-end examples ~57:23.

Sections

~00:14 — DevRel role at DeepMind and pushing back on per-model APIs
~03:19 — World models vision and DeepMind's full model lineup
~07:23 — Recent releases: Nano Banana 2, Veo 3.1 Light, Lyria, Lyria RealTime
~11:26 — Cookbook walkthrough: illustrating a public-domain book with the full stack
~15:31 — AI Studio vs Vertex AI and the new priority/flex service tiers
~22:37 — Generating character images with Nano Banana 2 and chat-mode history
~29:46 — Per-chapter scenes with targeted character references and the Interactions API
~34:53 — Veo 3.1 video generation and dedicated motion prompts
~42:02 — Lyria music generation, lyrics with timestamps, and prompt-only control
~47:16 — TTS trick: faking multi-character dialogue with two voices and style cues
~55:20 — Lyria RealTime, Space DJ demo, and Q&A on Europe availability

Tools: Nano Banana 2, Veo 3.1, Veo 3.1 Light, Lyria, Lyria RealTime, Gemini 3 Flash Image Preview, Gemini 3.3 Flash, Gemini TTS, Gemma 4, AI Studio, Vertex AI, GenAI SDK, Google Colab, Interactions API, AlphaEvolve, AlphaGenome

Podcast

AI Engineer

Eoin Mulgrew at AI Engineer: Rewiring the [UK] State with AI

Eoin Mulgrew from the 10 Downing Street data science team (10DS) describes a small insurgent unit that recruits exceptional outside technical talent — lab researchers, big tech engineers, YC founders — into government fellowships, pays market-ish rates, and embeds them as forward-deployed engineers across UK departments. He walks through concrete shipped work (policy simulation, statute-book analysis that displaced a £1.5M law firm engagement, delivery red-teaming dashboards, the Extract planning-application tool with DeepMind/Gemini, AI tutor safeguards, prison-system work via Just AI) and frames the program as a pilot to prove that elite teams can move government at unusual speed. ^{[9]AI Engineer — Rewiring the State — Eoin Mulgrew, 10 Downing Street}

~00:00 Eoin Mulgrew opens by introducing himself as part of 10DS, the No. 10 Downing Street data science team set up during the pandemic, whose core mission is making sure the country's most important decisions are informed by the best evidence. He explains that 10DS is now radically scaling up its own AI engineering capability — not just for No. 10, but to drive AI adoption across strategically important parts of the UK state.

~02:08 He sets the scene with the UK's public service delivery crisis: 7.25 million people on NHS waiting lists, ~350,000 backlogged court cases, only 1 in 5 planning applications decided on time, and a public-sector productivity crisis worsened by the pandemic. Citing a Tony Blair Institute figure of a £40B annual productivity prize from AI in government, he argues the 400,000-person civil service should be viewed as a large, complex industry ripe for disruption.

~03:09 Mulgrew diagnoses why government struggles to build high-performing technical teams: uncompetitive pay, deep hierarchy, bureaucracy, slow movement, and real-and-perceived regulatory friction. He acknowledges some safeguards are sensible (accountability to Parliament and the public) but argues the net effect is an environment that repels ambitious technical people.

~05:11 He introduces the 'insurgency model': a small unit at the center with a No. 10 mandate, unusually high political backing, autonomy to be opportunistic about where to land, market-rate pay (within reason — 'not Meta money'), and permission to bypass the standard civil service hiring process. Their bespoke, technically rigorous selection process has a ~0.7–0.8% success rate, and — critically — they recruit *exclusively outsiders* from frontier labs, big tech, top research institutes, YC founders, and serial entrepreneurs. He cautions this isn't as simple as 'big stick from ministers'; the point is to recruit missionaries, not mercenaries.

~08:12 Operating mode: there's an abundance of low-hanging AI use cases across the legacy system that 10DS handles in-house, with forward-deployed engineers — the first in No. 10's history — embedding with policy advisers, lawyers, comms people, and pollsters to go from idea to shipped capability in a couple of weeks. For harder problems (e.g., the major backlogs), they take a partnership model, seconding people into other departments for longer engagements.

~10:13 Concrete in-building examples follow. Policy simulation lets policy teams model the impact of decisions (e.g., universal credit changes on household finances) before they're made — not to replace human analysis, but to dramatically increase how many decisions are informed by high-quality modeling. The statute-book analysis case: the Cabinet Office was about to spend £1.5M on an outside law firm to analyze the entire UK statute book ('the height of four African elephants of legalese'); instead, a 10DS engineer embedded with in-house lawyers for two weeks and built a tool the team can re-run any time — solving both cost and the fact that the law-firm output would have gone stale faster than it could be produced. A delivery red-teaming tool — effectively a PMO in delivery teams' pockets — interrogates departmental progress reports and flags optimism bias, persistent amber-rating patterns, and whether mitigations have historically worked.

~13:15 He notes a transparency upside: until recently the government had never published a public-facing delivery dashboard; two have now shipped in as many months, including one tracking Matt Clifford's AI Opportunities Action Plan. He also teases (without details) a new public service launching in 2.5 weeks that millions will use, conceived only two months earlier — a timeline that would normally see a project still in 'discovery' a year in.

~14:15 Mulgrew turns to partner organizations across the wider ecosystem. The AI Safety Institute (AISI) — the world's first government body for evaluating frontier models — was seeded with 10DS fellows from day one; Dr. Harry Coppock led work on the Inspect tool, a CFI-isolated environment for testing what AI agents actually do when given autonomy and tools. The Incubator for AI (i.AI), now in DSIT, was largely founded by former 10DS fellows and is a true spin-out; 10DS continues to collaborate as i.AI scales work. A recent example is Extract, a DeepMind/Gemini collaboration that digitizes planning applications (including hand-drawn maps), unveiled by the Prime Minister at London Tech Week and now rolling out to every local authority in England — with the aspiration that more applications eventually be decided automatically by AI.

~17:18 On AI tutors and the education gap, Mulgrew is cautious: rather than building competing products, 10DS is producing safeguards and evaluating frontier models against benchmarks like cognitive load on students, so schools can adopt third-party tools safely. Then Just AI — a new MOJ team founded by former fellow Dan James — deploys forward-deployed engineers into prisons and the criminal justice system, embedding with parole officers and prison wardens to stop drug flow into prisons, automate manual processes, and improve safety.

~19:21 He closes with current fellow Will, who a few months ago was 'in California getting a tan' after dropping out of Harvard, founding a YC-backed company, and selling it — and in week two of the fellowship is standing outside HMP Wandsworth with the keys to the prison. That, Mulgrew says, is the pitch: 'We'll give you the keys to the state and see what you can do.' Early proof points are real — money saved, public services shipped at unprecedented speed, frontline services reformed, new AI capability in the hands of top-of-government teams — and he ends with an explicit recruitment pitch.

~21:25 In Q&A, he addresses sycophancy risk in policy simulation tools (mitigated via red-teaming and upskilling non-technical users), how to scale beyond an 'insurgent' pilot (eventually making this BAU rather than a hack, and going after horizontal cross-government use cases like transcription and call centers in DWP/HMRC), the AI-tutors strategy (benchmarks and guardrails, not products), and limited but real international collaboration with US Digital Service-style task forces and Singapore.

Sections

~00:00 — Intro: 10DS and scaling AI engineering inside No. 10
~02:08 — UK public service delivery crisis and the £40B AI prize
~03:09 — Why government struggles to attract elite technical talent
~05:11 — The insurgency model: mandate, autonomy, outsider-only recruiting
~08:12 — How they operate: forward-deployed engineers and partnership model
~10:13 — In-building wins: policy sim, statute-book analysis, delivery red-teaming
~14:15 — Partners: AISI, the Incubator for AI, Extract for planning applications
~17:18 — AI tutors safeguards and Just AI in the criminal justice system
~19:21 — Will at HMP Wandsworth, recruitment pitch, and early proof points
~21:25 — Q&A: sycophancy, scaling beyond the pilot, EdTech, international collaboration

{'timestamp': '05:11', 'text': 'Let us take the shackles off. Let us set up a small insurgent unit at the very center that is not burdened by some of the constraints I just described.'}

{'timestamp': '08:12', 'text': "We do want to recruit missionaries, not mercenaries. The pay matters, but it's not alone — a paycheck is not going to get you out of bed in the morning when stuff gets hard."}

{'timestamp': '11:15', 'text': 'The Cabinet Office was about to spend £1.5 million getting an outside firm of lawyers to come in and do analysis of the entire UK statute book... Instead, one of our engineers embedded with that team of in-house lawyers for a couple of weeks.'}

{'timestamp': '19:21', 'text': "You've maybe done good stuff in industry — that's brilliant. Come join us, and we'll give you the keys to the state and see what you can do."}

{'timestamp': '23:25', 'text': "This at the moment is basically a hack to get around the system. We need to change that — I would like to see a lot of what we're doing become the norm, become BAU."}

Tools: Gemini (Google DeepMind) — underlying model for Extract planning-application tool, Inspect — AISI's CFI-isolated environment for evaluating AI agent behavior with autonomy and tools, Extract — DeepMind/10DS planning-application digitization tool rolling out to every English local authority, AI Opportunities Action Plan public dashboard, Internal policy simulation tool (e.g., universal credit impact modeling), Statute-book analysis tool (replaced £1.5M law-firm engagement), Delivery red-teaming / PMO tool for No. 10 delivery oversight

Podcast

Latent Space

Latent Space: Yaroslav Azhnyuk (The Fourth Law) and Noah Smith on AI, Robotics, and the Next War

Ukrainian founder Yaroslav Azhnyuk (Pet Cube, now The Fourth Law and Odd Systems) and economist Noah Smith join Latent Space to argue that drone-defined, software-updatable warfare has already overtaken tanks and artillery, that China's manufacturing scale is a strategic alarm bell, and that the West is several years behind on autonomy, mass production, and supply chains. ^{[10]Latent Space — The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion}

Brandon hosts Noah Smith and Yaroslav Azhnyuk, who pivoted from making pet cameras to building explosive-carrying drones after Russia invaded Ukraine on Feb 24, 2022. ~01:04 Yaroslav recounts landing in Kyiv on the last flight before the war, fleeing west, and concluding it would be 'immoral not to fight back.' ~07:15 He co-founded Brave1 and the D3 fund with Eric Schmidt before launching The Fourth Law (on-drone autonomy) and Odd Systems (thermal cameras), now likely the leading Ukrainian unmanned-AI/thermal group, supplying 200+ Ukrainian drone makers and building two semiconductor plants for thermal sensors. ~14:21 He frames drones as the first software-defined weapons platform — a Roman legionnaire who gets a new helmet via OTA update. ~16:24 Product lines cover FPV strike, bombers, Shahed interceptors (the Zero hits 326 km/h vs Shahed cruise ~220 km/h), and ISR interceptors. ~19:31 Fiber-optic drones beat jamming and radio horizon but add ~3 kg and cost has spiked from $4 to $32/km because AI data centers are buying up the same fiber. ~24:37 FPV drones now cause 70–80% of front-line casualties, dethroning artillery as 'god of war'; a Rheinmetall tank ($5M+) is dwarfed by 4M FPVs Ukraine made last year (~$500 each, 7M targeted in 2026). ~29:43 Five levels of autonomy: 1) terminal guidance, 2) autonomous bombing, 3) autonomous target detection/engagement, 4) autonomous navigation, 5) autonomous takeoff/landing — inspired by self-driving car levels. Level-1 alone raised one operator's success rate from 20% to 71% and extended the kill zone from 3 km to 10 km. ~41:52 Yaroslav adds 'eight dimensions' of autonomous battlefield (level, platform, domain, swarming/nests, environment, C2, infrastructure, distribution scale). ~44:55 He argues that within 5–10 years it will be 'immoral to use weapons without AI' the same way manual driving will be immoral. ~46:57 On 'level-six' general-AI command, he says it's technically feasible now (Iran ops reportedly used Palantir for target designation) but the existential risk is an adversary deploying it 1000x faster than you. ~51:08 Noah revisits his 2013 prediction that autonomous suicide drones would cleanse battlefields of infantry; Yaroslav agrees but says humans in dugouts still hold ground and humanoid combat robots are maybe 5–10 years out as inference costs collapse and open-source models proliferate to China, Russia, North Korea. ~56:10 On the long tail of edge cases, he notes ~200 Ukrainian companies tried terminal guidance and only his solved it; perception/planning/control can be classical, neural, or one big end-to-end net like Physical Intelligence's world models. ~65:21 The scariest scenario: China launching millions of fully autonomous fixed-wing 200–300 km drones from shipping containers or barges off any coast. The US lacks the autonomy tech, mass manufacturing, components, and rare-earth refining to counter it. ~75:30 Thermal cameras and rare-earth-magnet motors are key non-China chokepoints. ~77:30 Yaroslav credits Trump and JD Vance's Munich speech with finally jolting Europe on defense, but Europe went 'from winter 2022 to spring 2022' while Russia/Ukraine moved a full year forward; Poland buying tanks and submarines without FPV operators is a 1939-style mismatch. ~86:40 Iran's Shahed strikes on US bases and the failure of Russia (and US in Iran) to fully achieve war aims show that mass alone doesn't guarantee victory — political and economic endurance matter. ~91:42 Post-Budapest-Memorandum, expect more countries (Japan, South Korea, Poland, Ukraine) to pursue nukes for credible deterrence. ~93:43 Drone-race breakdown: Ukraine ahead in front strike (post fiber-optic scale-up), sea drones, ground drones, and now deep strike; Russia still ahead on CRPA GPS-free antennas and glide bombs (up to 80 km). [101:58] On infantry survival: shotguns work (thousands of recorded shootdowns, one Russian 'Rambo' got 7), but average soldier 'will just die'; portable EW jamming only works if the drone isn't on fiber, isn't autonomous, or shares your frequency. [107:04] Counter-drone: fishnets line all roads ~50 km from the front, tanks bristle like porcupines, anti-FPV harpoon-launched interceptor drones, and 'active armor' swarms of protector drones. [109:05] He destroys Raytheon's 10 kW laser pitch (3s per kill, $3M each = 6,000 FPVs would saturate it); the right answer is cheap mass, not bigger lasers. [113:08] Final pitch: 'Si vis pacem, para bellum' — invest in defense; Kyiv is the Silicon Valley of defense and the West must integrate, reform procurement (Pete Hegseth's drone-dominance push), and 10x effort.

Sections

~00:00 — Cold open: 4M Ukrainian vs 4B potential Chinese FPV drones
~01:04 — Feb 23, 2022: from Pet Cube founder to fleeing Kyiv
~07:15 — Brave1, D3 fund with Eric Schmidt, and founding The Fourth Law / Odd Systems
~14:21 — Product lineup: FPV strike, bombers, Shahed and ISR interceptors, thermal cameras, semiconductors
~19:31 — Fiber-optic vs radio drones and the AI-data-center fiber price spike ($4 to $32/km)
~24:37 — FPVs cause 70–80% of casualties; $500 drone vs $5M Rheinmetall tank
~29:43 — Five levels of autonomy and how Level 1 took success from 20% to 71%
~41:52 — Eight dimensions of the autonomous battlefield
~46:57 — Toward level-six AI generals; Palantir-style target designation in Iran
~51:08 — End of infantry? Humanoid combat robots, edge cases, and the long tail
~65:21 — The China scenario: millions of autonomous drones from shipping containers
~75:30 — Chokepoints (thermal, magnets, rare earths) and waking up Europe
~86:40 — Shahed lessons, Budapest Memorandum, and the coming nuclear-proliferation wave
~93:43 — Asymmetric drone-race scorecard: where Ukraine vs Russia is ahead
~101:58 — Surviving an FPV: shotguns, fishnets, porcupine tanks, interceptor drones
~109:05 — Why Raytheon's $3M laser loses to mass; Kyiv as defense Silicon Valley; si vis pacem para bellum

I went from making cameras that fling treats to pets to cameras that fling explosives to the occupiers.

It's the first time in the history of war you could push a software update and get all of your Roman legionnaires a new helmet.

Last year Ukraine produced 4 million FPV drones. China can produce 4 billion of these FPV drones.

5 to 10 years from now it will be immoral to use weapons without AI — the same way it will be immoral to drive your own car manually on a public road.

The drones we manufacture in one day will be more than enough to destroy all the tanks Rheinmetall manufactures in a year.

Kyiv and Ukraine is the defense valley. It's the point where the future of defense has already arrived.

If you want peace, be prepared for war.

Tools: The Fourth Law, Odd Systems, Pet Cube, Brave1, D3 (Eric Schmidt defense fund), Zero interceptor (326 km/h), DJI Mavic, Shahed (drone), Palantir, Rheinmetall Skynex, Raytheon 10 kW / 20 kW laser, Patriot, Abrams tank, Nuros (rare-earth motors), Sting interceptor drone, P1 Sun interceptor drone, Vyriy Drone, Physical Intelligence (Fei-Fei Li world models), DIU (US Defense Innovation Unit), Drone Dominance program (Pete Hegseth)

Podcast

DeepLearningAI

Andrew Ng's Full AI Prompting Course

Andrew Ng's complete prompting course teaches how to become an AI power user across information retrieval, brainstorming, writing, reasoning, multimodal, and code generation workflows. ^{[11]DeepLearningAI — Full AI Prompting Course with Andrew Ng}

Andrew Ng frames the course around the gap between AI novices, who treat models like a Google search, and power users, who provide rich context, iterate, and use modern features like deep research and agentic desktop apps ~00:00. He argues that prompting AI is one of the most impactful job skills you can develop in 2026, and that today's models are dramatically more capable than even a year ago ~08:13.

Module 1 covers how models get their knowledge. Pre-trained knowledge comes from trillions of words of internet text, books, Wikipedia, news, and research, with reliability roughly tracking how often a topic appears online ~10:14 ~12:16. Because the knowledge has a cutoff date, models trigger web search (or you can force it) for current events, location-specific queries, and niche facts ~16:23 ~19:25. Ng explains the under-the-hood architecture: a user-facing model delegates to a second assistant model that runs searches, scans and summarizes pages, then returns summaries — which is why models sometimes misrepresent the sources they cite ~25:31. He warns that web search often pulls from popular but unreliable sources (Reddit, Wikipedia, YouTube, Yelp top the citation list) and recommends steering models toward official sources like WHO, FDA, or EMA for high-stakes questions ~22:29 ~23:30. Deep research, an agentic loop that issues many parallel searches, evaluates relevance, and iterates over many minutes, is highlighted as a powerful but underused capability for synthesizing dozens of sources ~29:35 ~32:36.

Module 2 positions AI as a thought partner. For brainstorming, basic prompts produce common-sense answers (squats, push-ups) because models reflect the average of their training data; giving more context and iterating with feedback pushes outputs into more unique, creative territory ~44:52 ~47:55. Ng recommends asking for 3-5 options, then giving targeted feedback to efficiently shape additional context ~49:57. Context itself is explored in depth: modern models accept up to ~750,000 words (roughly the first 4-5 Harry Potter books), and the context window auto-fills with system prompt, tool descriptions, chat history, and uploaded files ~54:01 ~55:01. He recommends starting a new conversation when switching topics to avoid stale context polluting answers ~58:03. Desktop co-working apps (Claude Code, Microsoft Copilot, Google Antigravity) let AI agentically explore files on your computer, but Ng cautions to scope folder access carefully because deletes often skip the recycle bin ~59:03 ~64:10.

Reasoning models can now tackle tasks that take humans many hours; the old 'think step by step' advice is largely obsolete, replaced by 'think hard' or 'ultrathink' triggers and built-in thinking modes ~68:12 ~71:17. Sycophancy is called out as a pervasive issue — models agree about 10x more often than they disagree per a Washington Post study — and the fix is neutral framing, avoiding leading words, and using objective rubrics ~74:21 ~76:21. For writing, Ng pushes a progressive outlining workflow: research, outline, iterate on outline, expand to bullets, iterate, then write final text — far more efficient than editing a finished draft ~82:27 ~85:32. AI slop signals include overuse of em-dashes, 'delve,' 'nuanced,' lists of three, and 'not X but Y' constructions ~80:27 ~81:27. For critique, he recommends point-based objective rubrics with clear yes/no criteria, and notes that cross-model review (one model judges another's output) gives a small but real quality boost ~89:36 ~93:40.

Module 3 covers multimodal capabilities. Output costs rise sharply from text to speech to images to video, which makes iteration harder for richer modalities [105:53]. For image inputs, models read coarse content well but miss fine-grained details (e.g., gym machines), and they handle handwritten text decently but not for high-stakes use [111:57][113:00]. For image generation, Ng explains diffusion models — generating the whole image at once by iteratively denoising — and notes characteristic failure modes like extra fingers and garbled text, which modern models like Nano Banana have largely fixed [119:03][121:06]. The course closes with vibe-coding examples: a fireworks app, pomodoro timer, bill splitter, French flashcards — built from single prompts specifying goal, input, and output [124:10][126:10]. Data analysis via code execution is shown with a bubble tea sales example where the model writes Python, runs it, and produces an annotated year-in-review graphic [131:20][134:23]. The final project walks through brainstorming a research question, running deep research, then turning the report into a quiz, mini-game, or infographic [141:32][144:35].

Sections

~00:00 — Novice vs. power user: context, honest feedback, and iterative writing
~09:14 — Where AI gets its knowledge: pre-training and reliability heuristics
~15:22 — Web search: when it triggers, how it works under the hood, and source quality
~28:34 — Deep research: agentic multi-source synthesis
~42:51 — Brainstorming with iteration and feedback to escape common-sense answers
~52:59 — Context windows, chat history, and when to start a new conversation
~59:03 — Desktop co-working apps: agentic file access and safety
~65:10 — Reasoning models and 'think hard' prompting for long-running tasks
~73:19 — Sycophancy: neutral framing and objective rubrics
~79:25 — Writing without AI slop: progressive outlining and rubric-based critique
~99:47 — Multimodal inputs and outputs: cost, quality, and image understanding
~115:01 — Image generation with diffusion models and prompt vocabulary
~122:06 — Vibe-coding games and useful apps from a single prompt
~128:14 — Data analysis via code execution and the final research-to-app project

{'timestamp': '02:04', 'text': "If you think of AI as maybe being akin to a really smart, fresh college grad, highly motivated, but that doesn't really know that much about you yet, then a short prompt sometimes doesn't give it enough information."}

{'timestamp': '68:12', 'text': "Several years ago you may have heard advice like tell the AI model to think step by step. That advice is largely obsolete now. I'm more likely to just tell it to think hard."}

{'timestamp': '75:21', 'text': 'ChatGPT tended to agree strongly about 10 times more than it disagreed.'}

{'timestamp': '85:32', 'text': 'Editing the outline is very high leverage — changing a few words of the outline causes an entire section of the final article to change.'}

{'timestamp': '114:00', 'text': 'A picture is worth a thousand words. So adding an image to a prompt can often be the fastest way to get the AI model the best context.'}

Tools: ChatGPT, Gemini, Claude, GPT-5.4, Nano Banana, Imagen, Claude Code, Microsoft Copilot, Google Antigravity, deeplearning.ai, The Batch, Build with Andrew (deeplearning.ai course)

Podcast

Acquired

Acquired: Vanguard — The Story of Jack Bogle's Index Fund Revolution

Ben Gilbert and David Rosenthal trace how Jack Bogle, born in 1929 into ruined wealth, built Vanguard into a $12T mutual-owned giant whose low-fee index funds transferred roughly $1 trillion from Wall Street back to ordinary investors. ^{[12]Acquired — Vanguard: The communist capitalist who saved investors a trillion dollars}

Acquired opens ~00:00 by framing Vanguard as the single most relevant company they have ever covered — the firm that effectively created the first retail index fund in 1975 and today manages over $10T in passive index funds, owning ~10% of every S&P 500 company. Together with BlackRock, State Street, and Fidelity, index providers now own ~24% of the entire US stock market ~01:00. Vanguard is uniquely structured: it is owned exclusively by its fund customers, with no outside shareholders and no equity even for its CEO ~02:02. Morgan Housel calls Bogle "an undercover philanthropist" who may be the greatest philanthropist of all time given the trillion dollars in fees saved ~04:03.

Bogle is born May 1929 on the eve of the Great Crash. Only 1–2% of Americans owned stocks then, so the Depression hit through 9,000 bank failures, 25% unemployment, and the wiping out of 9 million savings accounts ~06:07. His prosperous New Jersey family loses everything; his father becomes an alcoholic and abandons them, and the three Bogle boys work multiple jobs through childhood ~08:10. Scholarships get them to Blair Academy, where Jack flourishes; the brothers decide only Jack will go to college, a weight he carries for life ~11:11. At Princeton on a work scholarship, he gets a D+ on his first economics midterm but finds his calling ~13:13.

Reading Fortune's "Big Money in Boston" article in Firestone Library, Jack discovers the new open-ended mutual fund industry ~14:13. His senior thesis argues that because investors in aggregate ARE the market, minimizing fees is the surest way to beat it ~25:19. Walter Morgan, a fellow Princeton alum running Wellington Management in Philadelphia, hires Jack out of school in 1951; Bogle rises to president by 1965 at age 35 ~29:21. But the conservative balanced-fund style is being eclipsed by Fidelity's "go-go" growth funds led by Jerry Tsai ~33:24. Morgan tells Jack to "do whatever it takes" ~37:27.

Jack merges Wellington with Boston go-go shop Thorndike, Doran, Paine & Lewis (Ivest fund) on a near-equals basis — $2B Wellington vs $17M Ivest at a 60/40 equity split ~44:36. The 1970s oil crisis and stagflation crater the market 50%; the Ivest fund draws down 65% in one year, and Wellington's AUM collapses from $2B to $480M ~46:38. Bogle has a "Jerry Maguire moment" and proposes mutualizing the funds — dissolving the management company and operating at cost ~49:42. On January 23, 1974, the four Ivest partners band together and fire him as CEO of Wellington Management ~52:43.

But Jack was also chairman of the funds, technically separate legal entities. He calls a special meeting ~55:47 and the fund board, after he prepares a 250-page report ~59:49, votes — barely — to let him form a new subsidiary owned by the funds, limited to fund administration only (no investment advice, no distribution) ~62:53. An antiques dealer shows up with prints of British naval ships, and Jack picks the name Vanguard — the HMS Vanguard that defeated Napoleon at the Nile ~67:58. Vanguard incorporates September 1974 ~68:00. Capital Group's John Lovelace tells Bogle at a 6am LAX meeting: "If you do that, you will destroy this entire industry" ~65:57.

In 1974, Paul Samuelson publishes a paper arguing someone should create a fund that "apes the whole market" ~73:06. Bogle realizes the loophole: an index fund requires no active investment advice, so it fits Vanguard's mandate ~78:11. In 1976 Vanguard launches the First Index Investment Trust (today's VFIAX, second-largest fund in the world at $1.5T) ~86:16. The IPO is broken — they target $150M and raise $11.3M, not even enough to buy a representative sample of all 500 stocks ~88:17. Ned Johnson at Fidelity scoffs: "I can't believe that the great mass of investors are going to be satisfied with just receiving average returns" ~91:21. To survive, in 1977 they merge another Wellington fund into the index fund to keep it alive ~95:24.

The scale economies model — "Costco for finance" ~93:22 — means every dollar of profit gets returned to fund holders as lower fees. In 1982 Vanguard wins distribution rights by going no-load ~97:25. The 500 index fund takes six years to reach $100M and another six to reach $1B in 1988 ~98:26. Fees compress from 68 bps at launch to 35 bps by 1987 [104:29], then ultimately to 3 bps today. Bogle's "cost matters hypothesis": a 1% fee compounded over 40 years is the difference between $1.5M and $1M on a $100k starting investment ~81:13.

Bogle suffers his first heart attack at 31 [108:32] and has roughly a dozen over his life. By 1995 half his heart has stopped working; he steps down as CEO in early 1996 after 128 days in the hospital, handing the reins to John Brennan [111:34]. He gets a heart transplant in February 1996 and lives another 23 years [113:35]. The board fight over ETFs comes to a head: Nathan Most pitched the idea to Vanguard in 1992 but Bogle hated the temptation to trade in and out [120:41]. State Street launches the first ETF (SPDR) instead [124:44]. In August 1999 Vanguard enforces its mandatory board retirement age of 70 against Jack — even though there was an older board member — and removes him from the board [127:45]. Vanguard finally launches ETFs in 2001 [130:53].

A grassroots Bogleheads community emerges on Morningstar forums in 1998, now 2M monthly visitors on bogleheads.org plus a 400k-strong subreddit [129:51]. Warren Buffett endorses Vanguard in the 1996 Berkshire letter and in 2016 writes: "If a statue is ever erected to honor the person who has done the most for American investors, the hands-down choice should be Jack Bogle" [154:18]. Buffett's 2007 bet — the Vanguard 500 vs any portfolio of 5 hedge funds over 10 years — is taken only by Ted Seides; the index returns 126% vs 36% [152:16]. After the 2008 crisis, active management's promise of downside protection collapses; Vanguard's share of mutual-fund inflows doubles to 30 cents of every new dollar [156:21], and from 2014-2019 Vanguard takes in $1.2T vs $500B for the rest of the industry combined [156:21].

Jack dies January 2019 at age 89 with an estate of ~$80M — versus the Johnson family's $40-50B from Fidelity, by some estimates the largest amount of personal wealth ever forgone [159:23]. Today Vanguard manages $12T (84% passive, 2T active), serves 50M investors, has an average expense ratio of 7 bps (vs 44 bps industry average), and 84% of its funds beat their peers over 10 years [181:50]. The first outside CEO in firm history, Salim Ramji — formerly of BlackRock's iShares — was appointed in May 2024 to address weaknesses in technology, customer service, and entry into private assets via a Blackstone alliance [171:37].

The Wellington partners who fired Bogle rebuilt Wellington Management into a $1.3T pure-active firm that still advises the Wellington fund inside Vanguard to this day [184:51]. Bogle and the Ivest partners reconciled at a Boston dinner in the early 1990s [187:53]. In Acquired's 7 Powers analysis, Vanguard scores on scale economies, the most extreme counter-positioning ever (you cannot replicate it without forgoing all profits), brand (Bogleheads, Buffett's endorsement), and process power [205:08]. The episode closes with Ben's quintessence: Bogle bifurcated public equities into a commodity sleeve where lowest cost wins, and David's: Bogle is one of the rare cases where a single human really did change the world [212:10].

Sections

~00:00 — Vanguard's scale and the trillion-dollar fee transfer
~05:05 — Jack Bogle's birth into Depression-era ruin
~14:13 — Princeton thesis: 'Big Money in Boston' and the math of fees
~27:19 — Joining Wellington, becoming president at 35, and the go-go era
~43:34 — The Ivest merger and Bogle's getting fired in 1974
~55:47 — The mutualization end-run and naming Vanguard
~70:02 — Inventing the first retail index fund (1976) and the broken IPO
~91:21 — Scale economies shared: Costco for finance, fees compress
~108:32 — Heart transplant, succession to Brennan, and the ETF fight
~141:05 — 2008 crisis: active management's promise breaks, Bogleheads rise
~161:26 — Today: $12T, Fidelity/BlackRock comebacks, Salim Ramji era
~185:53 — Wellington's parallel $1.3T rebuild and the reconciliation
~204:07 — Seven Powers and quintessence: one person changed the world

[03:03] 'Vanguard has saved investors over $500 billion in fees and trading costs since its founding in 1975... Jack Bogle and Vanguard are responsible for a trillion dollars of wealth transfer out of the pockets of Wall Street and the finance industry and into the pockets of individual investors.'

[60:50] 'The present structure has been the accepted norm for the mutual fund industry for 50 years. The issue we face is whether a structure so traditional, so long accepted... is really the optimum structure for these times.' — Bogle's mutualization report

[61:51] 'I realized that a mutual company would never provide me with the personal fortune that so many denizens of Wall Street would earn. But it offered, I believe, my last best chance to resume my career.' — Jack Bogle, memoir

[65:57] 'If you do that, you will destroy this entire industry.' — John Lovelace Jr. of Capital Group to Bogle on mutualization

[91:21] 'I can't believe that the great mass of investors are going to be satisfied with just receiving average returns. The name of the game is to be the best.' — Ned Johnson, Fidelity, on Vanguard's index fund launch

[154:18] 'If a statue is ever erected to honor the person who has done the most for American investors, the hands-down choice should be Jack Bogle... He is a hero to them and to me.' — Warren Buffett, 2016 Berkshire letter

[195:58] 'Where returns are concerned, time is your friend, but where costs are concerned, time is your enemy.' — Jack Bogle

[213:10] 'The grim irony of investing is that we investors as a group not only don't get what we pay for, we get precisely what we don't pay for.' — Jack Bogle

AI Tools

The AI Daily Brief: Artificial Intelligence News

Codex Goes Mobile: AI Agents as Persistent Operators

OpenAI launched Codex in the ChatGPT mobile app, enabling full agent management from a phone — starting tasks, reviewing outputs, steering execution, and approving next steps without being tethered to a laptop. ^{[13]The AI Daily Brief: Artificial Intelligence News — What Google Needs to Do at I/O This Week}

OpenAI shipped Codex inside the ChatGPT mobile app as part of their new weekly Thursday release cadence ~11:06. Unlike Anthropic's earlier remote-control feature, this is a full-fledged experience where users can initiate new work, review outputs, and approve next steps entirely from their phone ~11:06. OpenAI engineer Nick Bowman described using a Mac mini as an always-on dev environment and his phone as the primary interface, with his laptop reduced to a satellite device — "the shape of it feels very real to me" ~13:07. Commentators framed this as a structural shift: as Lapo Cheresi observed at ~14:08, "the mobile interface isn't a convenience feature, it's an admission that we're entering a world where your job is triage, not execution" — and that the real bottleneck is now how fast humans can approve the next agent step, not how fast the AI can generate.

This is the beginning of AI agents becoming persistent operators, not just chat interfaces.

The mobile interface isn't a convenience feature, it's an admission that we're entering a world where your job is triage, not execution.

Tools: Codex, ChatGPT, Claude Code

AI Future

The AI Daily Brief: Artificial Intelligence News

Consumer AI vs. Work AI: A Widening Divide

The host argues that AI as a consumer technology is following a normal diffusion curve while AI for work is an abnormal, category-shifting disruption — and companies trying to serve both are in a difficult position. ^{[13]The AI Daily Brief: Artificial Intelligence News — What Google Needs to Do at I/O This Week}

Drawing on Narayanan and Kapoor's essay "AI as Normal Technology" ~16:09, the host argues that for everyday consumers AI is impressive but still following predictable adoption patterns, with pushback from users who feel AI is being forced on them where they don't want it. Work AI, by contrast, is moving at a pace and depth that has no historical parallel — users "cannot get updates to the models and harnesses fast enough" ~17:09. This creates a strategic fork: OpenAI has clearly chosen the work user as its primary target (evidenced by shutting down Sora), Anthropic was always work-focused, and Meta/Apple are locked into consumer AI. Google is the only major player still aggressively pursuing both ~18:09. The host frames this as more than a product decision — if large chunks of knowledge work are shifting from "doing the thing" to "managing AI agents that do the thing," that is a category change in what work itself means, not merely a productivity improvement.

Over the last 6 months, OpenAI made a very clear decision... that although they certainly weren't going to abandon their hundreds of millions of consumer users, the big game for them... was on that work user.

If I am correct in this assertion... big chunks of knowledge work are moving from doing the thing to managing AI agents that do the thing for us, that is a category shift in how we work.

Tools: Sora, Codex, Claude Code

Industry

The AI Daily Brief: Artificial Intelligence News

Google I/O Preview: Gemini Spark, Cheaper Models, and Harness Clarity

Google is expected to announce Gemini Spark (a 24/7 personal agent), a high-performance flash model at 15-20x lower inference cost than GPT-5.5, and — hopefully — consolidation of its fragmented coding agent lineup. ^{[13]The AI Daily Brief: Artificial Intelligence News — What Google Needs to Do at I/O This Week}

Leaked screenshots point to Gemini Spark, a consumer-facing always-on agent that draws on Gmail, apps, browsing history, and other personal data ~19:10. The positioning echoes years of Google promises about contextual AI, prompting skepticism — Peter Gostev quipped that he has "seen that line from Google for about 8 years with product name changed once in a while" ~20:10. On the work side, rumors suggest Gemini 3.2 Flash will land at 92% of GPT-5.5's performance on coding and reasoning while being 15-20x cheaper, with sub-200ms latency ~23:12. The host argues this cost advantage could be transformative for enterprise customers currently weighing Chinese open-source models, giving Google a real lane back into work AI. However, Google's agent harness story remains muddled — Gemini CLI, AI Studio, and Jules are all competing — and the host frames consolidation around a single clear harness as one of the most important things Google could announce ~24:12.

If Google can swoop in with a 20x cheaper inference that's at Opus 4.5 or 4.6 type of levels, a lot of those companies will breathe a cool sweep of relief.

Really hoping the upcoming Google IO brings some clarity... consolidation around what the core agent harness was going to be for the Google ecosystem.

Tools: Gemini Spark, Gemini 3.2 Flash, Gemini CLI, AI Studio, Jules, GPT-5.5, Claude Code, Grok

Industry

The AI Daily Brief: Artificial Intelligence News

AI Market Headlines: Cerebras IPO, Anthropic Valuation, and Microsoft Dropping Claude Code

Cerebras debuted with a 68% first-day gain and a $66B market cap; Anthropic is raising $30B at a $900B valuation; Microsoft is canceling Claude Code licenses for its own developers in favor of GitHub Copilot CLI. ^{[13]The AI Daily Brief: Artificial Intelligence News — What Google Needs to Do at I/O This Week}

Cerebras priced above its guided range and opened with a 100% pop before settling at +68%, starting the day as a $40B company and ending at $66B ~01:01. The host frames it as a harbinger for mega AI IPOs — SpaceX paperwork expected next week, with Anthropic and OpenAI rumored by year-end — and argues that in a market where "everyone is bidding AI," fundamentals debates feel beside the point ~02:02. Meanwhile, Anthropic is closing a $30B round at a $900B valuation, nearly tripling from its $380B Series G in February ~06:03. On the competitive front, Microsoft is terminating Claude Code licenses for internal developers at end of June, shifting them to GitHub Copilot CLI ~07:03. Sources told The Verge the tools were "maybe a little too popular," and the host reads it as both a cost-cutting move and a signal that competitive strategies among the labs are hardening.

Fundamentals don't matter if everyone is bidding AI and everyone right now is bidding AI.

Sources told The Verge that Anthropic's tools were extremely popular, maybe a little too popular.

Tools: Claude Code, GitHub Copilot CLI, Codex

AI Models

AICodeKing

Hermes Agent 0.14 Foundation Release: Simplified Installation and Lighter Runtime

Hermes Agent 0.14 overhauls its installation and dependency model, making the agent easier to get started with and faster to launch. ^{[14]AICodeKing — Hermes Agent 4.0}

Hermes Agent 0.14 is now available as a PyPI package, installable via 'pip install Hermes-Agent'. Previously, users had to clone the repo and run a custom setup. Heavy optional dependencies — messaging adapters, browser tools, voice tools, image tools — are now lazy-loaded only when needed, reducing the default install footprint. Cold start performance was also improved so Hermes loads faster and initializes less unnecessary code at startup. These changes target normal machines, small VPS setups, and local laptops.

If an agent tool is hard to install, many people will never even reach the good part.

Hermes should open faster, load less unnecessary stuff at startup, and feel less annoying when you just want to start a session quickly.

Tools: Hermes Agent, PyPI, pip

AI Models

AICodeKing

OpenAI-Compatible Local Proxy: Routing Existing Subscriptions to Coding Tools

Hermes 0.14 introduces a local proxy that exposes an OpenAI-compatible API endpoint, letting coding tools like Codex CLI, Aider, and Continue route through Claude Pro, ChatGPT Pro, or Super Grok without separate API keys. ^{[14]AICodeKing — Hermes Agent 4.0}

The new local proxy feature lets Hermes act as a local endpoint speaking the OpenAI API format. Behind the scenes it can use providers the user is already authenticated with through Hermes — including Claude Pro, ChatGPT Pro, and Super Grok (via XAI Grok OAuth). Coding tools such as Codex CLI, Aider, Klein, and Continue can point at Hermes as their endpoint, eliminating the need for per-tool API keys. Grok via SuperGrok also receives a much larger context window, useful for bigger codebases and longer research tasks. Additional search providers (Brave Search, DuckDuckGo) were added as budget-friendly web search alternatives. Computer use was improved and its backend now supports non-Anthropic providers. Messaging integrations — Microsoft Teams (end-to-end), Line, SimpleX Chat — plus session handoff (/handoff transfers a live session between models/profiles) and Discord history backfill were also added.

Instead of setting up a separate API key for every tool, Hermes can become the local router between your existing subscriptions and your coding workflows.

If Hermes can become the bridge between your subscriptions and all your coding tools, that is genuinely useful. The AI coding ecosystem is already messy enough.

Tools: Hermes Agent, Codex CLI, Aider, Klein, Continue, Claude Pro, ChatGPT Pro, Super Grok, XAI Grok, Brave Search, DuckDuckGo, Microsoft Teams, Discord, Telegram, Line, SimpleX Chat, GLM Coding Plan

AI Models

Sam Witteveen

MiniCPM-V 4.6: The Agent Vision Model

OpenBMB releases MiniCPM-V 4.6, a 1.3B parameter vision-language model that outperforms models twice its size on visual benchmarks and uses up to 43x fewer tokens than comparable models, making it well-suited for local agentic workflows. ^{[15]Sam Witteveen — MiniCPM-V 4.6: The Agent Vision Model}

Sam Witteveen introduces MiniCPM-V 4.6, a 1.3B parameter vision-language model from OpenBMB (Open Lab for Big Model Base, jointly run by ModelBest and Tsinghua University NLP Lab). [00:00–02:00]

Architecture and specs [02:00–03:30]: The model pairs a SigLIP 2400 vision encoder with the Qwen 3.5 0.8B language model backbone, totaling 1.3B parameters. It is Apache 2.0 licensed with fully open weights and supports a 262K token context window. It accepts single images, multiple images, and video input.

Benchmarks [03:30–05:00]: On the Artificial Analysis Intelligence Index, MiniCPM-V 4.6 scores 13, outperforming Qwen 3.5 0.8B and models like Mistral 3B despite being smaller. On MMU Pro (hard visual reasoning), it tops all sub-2B open-weights models. Sam notes it is not a replacement for Gemini in production accuracy-critical tasks.

Token efficiency [05:00–07:00]: The model's standout trait is efficiency — it uses approximately 5.4 million output tokens on the Artificial Analysis benchmark suite, roughly 19x fewer than non-reasoning Qwen 3.5 0.8B and 43x fewer than the thinking/reasoning variant. This is framed as critical for agent loops where every screenshot, tool call, and PDF page consumes context budget.

Visual token compression modes [06:30–07:30]: The model supports switchable 4x and 16x visual token downsampling at inference time. 4x preserves fine-grained detail (better for OCR, handwriting), while 16x is faster and more memory-efficient (better for video and bulk processing). Sam notes the downsampling mode can be exposed as a function tool argument so the agent itself can choose the appropriate mode.

Demo use cases [08:00–14:30]: Sam demonstrates the model locally via Hugging Face Transformers in a notebook: visual Q&A on natural images ~09:00, invoice and order receipt parsing (e.g., extracting item prices) ~11:30, handwritten medical receipt OCR including drug names and dosages ~12:00, and video understanding (football match description) [10:00–11:00]. The model supports thinking/non-thinking modes; thinking mode improves accuracy for complex tasks like cost calculation and fine-grained video analysis at the cost of more tokens [13:00–14:30].

Deployment [07:00–08:00]: Compatible with vLLM, SGLang, Llama.cpp, and Ollama. Quantized variants in GGUF and other standard formats are available. OpenBMB also provides example apps for on-device iOS, Android, and HarmonyOS deployment.

Sam's verdict [14:30–15:30]: Recommended as a dedicated vision sub-agent for local agentic pipelines, especially when paired with a text-only model that handles general reasoning — only invoking this model when visual input is needed.

Tools: MiniCPM-V 4.6, Hugging Face Transformers, vLLM, SGLang, Llama.cpp, Ollama, Qwen 3.5 0.8B, SigLIP 2400, Artificial Analysis Intelligence Index, MMU Pro

Developer Tools

Nate Herk | AI Automation

The Problem: Different Config Conventions Between Claude Code and Codex

Claude Code and Codex use different file/folder naming conventions for project instructions and config, but the underlying knowledge and skills are largely compatible. The goal is setting up a project so any coding agent can read it. ^{[16]Nate Herk | AI Automation — How to Use Your Claude Code Projects in Codex in 5 Mins}

Nate opens by noting that Claude Code looks for a `CLAUDE.md` file and a `.claude/` folder, while Codex looks for an `agents.md` file, a `.codex/` folder, and a `.agents/` folder ~00:00. Despite these differences in naming, both agents draw from the same shared project knowledge — documents, references, scripts, and context files. ~01:01 The key insight is that skill files (markdown with YAML front matter) are identical between the two tools; they just live in different directories. Agent definition files differ slightly in format — markdown for Claude Code, TOML for Codex — but serve the same purpose.

What's really awesome about working with different coding agents... is that they all are going to work out of basically the same shared knowledge.

Tools: Claude Code, Codex

Hot Take

AI News & Strategy Daily | Nate B Jones

The Prove-It Economy is Here | And Most Marketers Aren't Ready

The internet is shifting from an attention economy to an interpretation economy mediated by AI agents. ^{[17]AI News & Strategy Daily | Nate B Jones — The Prove-It Economy is Here | And Most Marketers Aren't Ready}

Nate argues that the 25-year-old attention economy — built on ads, eyeballs, and Google clicks — is being replaced by an interpretation economy where AI agents serve as the first point of contact for purchasing and hiring decisions ~00:00. He illustrates this with a personal example: when buying a new sound system, he bypassed all brand marketing entirely and instead chatted with Claude and ChatGPT, providing room dimensions, budget, and sound preferences to get tailored recommendations ~02:01. The core implication is that marketers and job candidates can no longer win solely by getting human attention — they must position themselves to be correctly understood and recommended by AI agents. Using AI for back-office automation (writing copy, polishing resumes) is now table stakes, not a differentiator ~04:02.

The whole internet economy has been built on attention for 25 years... Now, we are moving to an interpretation economy where the whole web is filtered through what an AI thinks about you.

The people who are in charge of marketing sound systems, I don't think they had anything to do with what sound system I picked. And that should terrify them.

Those are table stakes in 2026. You do have to do them. You're not going to get a lot of credit for it.

Tools: Claude, ChatGPT

Industry

AI News & Strategy Daily | Nate B Jones

The Prove-It Economy is Here | And Most Marketers Aren't Ready

Marketers need a structured 'truth layer' — machine-readable, opinionated product data — to remain visible in AI-mediated search and purchase flows. ^{[17]AI News & Strategy Daily | Nate B Jones — The Prove-It Economy is Here | And Most Marketers Aren't Ready}

Nate introduces the concept of a 'truth layer': clear, accurate, structured data about a product that AI agents can reliably parse and use to form an opinion ~05:05. He contrasts traditional emotional marketing claims with what agents actually need — provable, specific, factual information (e.g., material composition, energy-return metrics for a running shoe) presented in agent-readable formats like clean DOM or JSON schema ~06:07. Without this, products get 'flattened into the internet average' for their category and fall out of AI consideration sets entirely ~07:08. He extends this to individuals: the 'Talent Board' project he launched is designed to give candidates a place to demonstrate and prove AI skills — their own personal truth layer — so that when a hiring manager asks an AI for candidates, the evidence is there to surface them ~08:08.

Agents need you to prove it. Agents need you to prove it.

If you're not opinionated, then you are going to be flattened into the internet average for your category.

Almost all marketers say, 'That's too technical. That's not for me.' But that's where the leverage for marketing lies today.

Tools: ChatGPT (search agents)

AI Future

AI News & Strategy Daily | Nate B Jones

The Prove-It Economy is Here | And Most Marketers Aren't Ready

The best strategy combines an agent-readable truth layer with strong human brand memory — the 'two internets' require parallel investment. ^{[17]AI News & Strategy Daily | Nate B Jones — The Prove-It Economy is Here | And Most Marketers Aren't Ready}

Nate argues that brand loyalty and human memory become more valuable, not less, as more transactions are AI-mediated ~14:13. A strong emotional connection causes consumers to ask for a brand by name, effectively 'seeding the prompt' and constraining what the agent returns — bypassing open comparison. He warns that AI washing (inflating AI-native claims) is dangerous because the interpretation layer will expose the gap between story and substance ~15:14. The optimal position requires simultaneous investment in both internets: human-facing work that creates memory and trust, and agent-facing work that creates structure, evidence, and retrievability ~15:14. He closes by urging marketers and candidates to split their energy into two deliberate bets — making themselves deeply memorable offline/online to humans, and building a differentiated, opinionated truth layer that agents can compress and forward ~19:15.

Human memory becomes more precious as more of the transaction is mediated.

If your story is memorable, but your product truth layer is incoherent, the agent is going to flatten you out.

The best marketers will refuse to choose between the two internets. They'll choose both.

Don't be afraid to have opinions. Don't be afraid to be specific because that is how you stand out in a world where everyone is using their new AI tools.

Productivity

AI News & Strategy Daily | Nate B Jones

AI adoption pitfalls in organizations

Overly rigid, top-down AI adoption policies prevent teams from discovering what actually works for them. ^{[18]AI News & Strategy Daily | Nate B Jones — How teams accidentally sabotage AI adoption}

Nate B Jones identifies a common failure mode in enterprise AI adoption: companies impose prescriptive, top-down policies that dictate approved use cases for coding agents and AI tools. The better approach is to let different parts of the organization experiment and discover what works in their context, then create forums and channels that allow successful patterns to spread quickly across the company.

Hot Take

Low Level

Gemini CLI CVSS 10.0 CVE — Arbitrary Code Execution via settings.json in CI/CD

A critical (CVSS 10.0) vulnerability in Gemini CLI allowed attackers to execute arbitrary commands on CI/CD runners by injecting a malicious settings.json in a pull request, exploiting the agent's required "yolo mode" in headless pipelines. ^{[19]Low Level — The problem with AI agents..}

The video opens with the host describing a real-world manifestation of his two recurring nightmares — AI agents and post-install scripts — colliding in the form of a critical CVE in Gemini CLI ~00:00. He frames AI-assisted PR code review as genuinely useful (Red Hat's GitHub workflow is cited as a real example), but argues the trust model breaks entirely when the tooling has a vulnerability ~00:00.

The core issue: to run Gemini CLI in CI/CD headless mode, it must operate in "yolo mode" (`--yolo`), which by design allows the CLI to execute arbitrary commands on the host system ~01:00. A Gemini `settings.json` file — a normal, benign config — supports a `beforeAgent` hook that runs a shell command after the user submits a prompt but before any planning occurs ~02:00. Because a pull request author controls the files in the PR, a malicious actor could include a crafted `settings.json` with a `beforeAgent` hook set to any arbitrary payload.

On a self-hosted GitHub Actions runner, this gives the attacker code execution inside the CICD runner as the organization — enabling exfiltration of API keys, tokens, database credentials, and any environment variables in scope ~03:03. The host stresses he is not attributing this to Team PCP specifically, but uses it to illustrate a broader pattern.

Google's mitigations: a new `--gemini-trust-workspace` flag that is off by default, and updated `run-gemini-cli` GitHub Action versions 391 and 400-preview-3 that do not trust workspace settings by default ~07:04. Users who pin to an older Gemini CLI version must update their pin to receive the fix ~08:05.

Yolo mode is literally the mode that allows you to run arbitrary commands on the system.

By getting arbitrary code execution inside of the CICD runner, you could potentially extract their tokens, their API keys, anything that was living inside of the environment.

Pretend that everything is compromised. If something gets compromised, what are you going to do about it?

Tools: Gemini CLI, GitHub Actions, run-gemini-cli (GitHub Action workflow)

Hot Take

Low Level

Team PCP Supply Chain Attacks — AI Tooling as a Systemic CICD Risk Vector

The threat actor Team PCP has executed a chain of supply chain compromises — LiteLLM, Trivy, Checkmarx, Accurics — all rooted in compromised CI/CD pipelines, suggesting AI tooling with unsanitized settings files is a widening attack surface. ^{[19]Low Level — The problem with AI agents..}

The host pivots to Team PCP, a threat actor responsible for the majority of large-scale supply chain attacks in recent months ~03:03. Affected projects include LiteLLM, Trivy, Checkmarx (a software composition analysis tool), and Telniks. All trace back to compromised CI/CD infrastructure.

The attack chain: In March 2026, Team PCP exploited an Aqua Bot service vulnerability to compromise the Trivy GitHub Action repository ~04:03. That compromised Trivy action was then used to compromise Kics and AST downstream. Separately, the Accurics workflow was compromised because Accurics used the Checkmarx tool under the hood — illustrating how a single upstream compromise propagates through transitive dependencies ~05:04.

The host's structural argument: companies using good development practices (automated PR review, CICD, security scanning) are being exploited precisely because of a "bad assumption about the way that we're using them" — namely that code running inside CI/CD runners is trusted ~05:04. Self-hosted runners that expose privileged environment variables to any process at the same privilege level are especially dangerous.

Mitigations recommended: assume full compromise as the baseline threat model; use Linux user isolation to scope credentials per process; use Docker sandboxing (never as root) to limit blast radius; audit what credentials are accessible inside runner environments [06:04–07:04].

Companies are using good development tooling, they're using good security practices, and because they're using them, a bad assumption about the way that we're using them is causing huge downstream effects.

If you design your workstream around the fact that the code running in your runners is trusted... you're going to have a bad time.

Tools: LiteLLM, Trivy, Checkmarx, Accurics, GitHub Actions (self-hosted runners), Docker

Hot Take

Better Stack

Claude Wrote 50 Features. None of Them Worked Together

A developer rebuilt his entire vibe-coded project from scratch after Claude's 50 features failed to work together, identifying three root causes of AI-assisted code collapse. ^{[20]Better Stack — Claude Wrote 50 Features. None of Them Worked Together}

Shiv Bosal built K10s, a GPU-aware Kubernetes dashboard, entirely with Claude over seven months. Each individual feature worked in isolation, but the combined codebase broke down in use: stale data on view switches, empty tables, and conflicting keybindings. He archived the project and started over, diagnosing three systemic problems with how AI writes code: (1) AI builds features, not architecture — each prompt adds functionality without awareness of shared state across other features, so he wrote the architecture by hand and put it in CLAUDE.md; (2) AI defaults to a god object, cramming everything into a single struct, which he countered by explicitly instructing the LLM to split concerns into separate views; (3) velocity creates scope creep — when every feature costs just one session it feels free, leading to runaway additions and duplicate rebuilds. His fix was to document scope boundaries and anti-personas (who he was not building for) inside CLAUDE.md. The rebuilt project and a write-up went viral on Hacker News.

AI Tools

Better Stack

Why Anthropic Dropped Markdown for HTML

Anthropic's Claude Code team is moving from markdown to HTML output for richer, more readable AI-generated artifacts. ^{[21]Better Stack — Why Anthropic Dropped Markdown for HTML}

Farek from Anthropic argues that markdown breaks down for serious agent work — documents over 100 lines become unreadable walls of text. HTML enables SVG diagrams instead of ASCII art, styled tables instead of markdown grids, and interactive prototypes with sliders. Farek now creates HTML mockups for design options and attaches HTML explainers to PRs instead of relying on GitHub diffs. The tradeoffs are real: HTML takes 2–4x longer to generate, produces noisy diffs that are useless for code review, and consumes significantly more tokens. The key argument in favor is adoption — people actually read HTML output, whereas markdown goes unread, and it keeps engineers feeling in the loop rather than blindly accepting AI-generated specs.

AI Tools

Better Stack

Claude Just Recovered $400,000 in Lost Crypto

A trader used Claude to recover $400,000 in Bitcoin by identifying a overlooked backup file and fixing a configuration bug in a brute-force recovery tool. ^{[22]Better Stack — Claude Just Recovered $400,000 in Lost Crypto}

A trader lost access to five Bitcoin (worth ~$400,000) in an old encrypted wallet file because they forgot a password set in 2013. Their mnemonic seed phrase only restored HD addresses, not the imported address holding the funds. After failing to configure BTC Recover (an open-source Python brute-force tool) correctly, they uploaded their entire college computer's file directory and failed scripts to Claude. Claude identified two critical findings: a 2019 backup file the trader had overlooked, and a logical error in the Python configuration preventing the tool from properly testing password candidates. With the fixed configuration and narrowed parameters, the tool churned through 3.5 trillion combinations and successfully decrypted the private keys.

AI Tools

Arjay McCandless

The Agentic Loop

Explains the agentic loop — the core input-reason-plan-act-observe cycle that drives modern AI agents. ^{[23]Arjay McCandless — The Agentic Loop}

The video walks through the agentic loop, the foundational architecture behind most modern AI agents. The loop begins when the agent receives an input (user message, API call, error, or prior action result), then the LLM performs a reasoning step to assess what it knows and what to do next. For complex tasks a planning step breaks the goal into sub-goals; simpler tasks may skip it. The agent then acts — calling tools, hitting APIs, reading docs or databases, or executing code — and observes the outcome. If the task is complete it responds to the user; otherwise it loops back with the newly collected context, continuing until the goal is met, more user input is needed, or a stopping condition such as a token budget is reached.

Hot Take

DeepLearningAI

No more write code by hand. Write spec

The developer's role is shifting from writing code to writing specs — you are the architect, not the builder. ^{[24]DeepLearningAI — No more write code by hand. Write spec}

This video argues that the fundamental role of a software developer is changing: rather than writing code line by line, developers should now focus on writing precise specifications. AI handles the implementation ('Ange a builder'), while the human acts as the architect who defines intent, constraints, and requirements. The shift is framed as an elevation of the developer role — from craftsperson to systems thinker — where clarity of thought expressed as a spec is the primary skill.

Developer Tools

Real Python

Debug Slow HTTP Requests in Python With httptap

httptap is a Python CLI tool that breaks down HTTP request phases with timing and an ASCII performance graph to help diagnose slow connections. ^{[25]Real Python — Debug Slow HTTP Requests in Python With httptap}

httptap is a command-line and library tool for debugging slow HTTP requests in Python. Given a URL, it runs the request and reports connection metadata — IP address, HTTP version, TLS version, certificate expiry, response size, and CDN origin — followed by an ASCII performance graph showing time spent in each phase: DNS lookup, TLS negotiation, server wait, and transfer. This breakdown makes it easy to isolate bottlenecks; for example, a slow DNS lookup points to a local resolver issue rather than the server. The tool supports all standard HTTP methods (useful for testing REST APIs), is IPv6-aware, and can export results as JSON. It also handles redirect chains, showing timing for each hop. When used as a library, it supports custom DNS resolvers and TLS inspectors.

Developer Tools

marimo

The Steam Deck is The Best Way to Annotate Data

Using a Steam Deck as an ergonomic data annotation device with Marimo notebooks via keyboard/D-pad shortcuts. ^{[26]marimo — The Steam Deck is The Best Way to Annotate Data}

The video demonstrates an unconventional but ergonomic workflow for data annotation using a Steam Deck running a Marimo web app in Chrome. At ~00:00, the presenter explains that while Marimo notebooks run well as web apps on mobile phones, touch input is unreliable — the Steam Deck solves this with its trackpads, D-pad, and physical buttons. At ~01:02, a keystroke widget from the 'Wiggly Stuff' library is introduced; it captures keyboard events and updates downstream notebook cells reactively, enabling keyboard-driven navigation and annotation. At ~02:02, the D-pad on the Steam Deck is shown to map to arrow keys, and side buttons map to keys like Space and 'S', making the hardware act as a keyboard for the web app with no additional configuration. At ~03:02, a dedicated data annotation widget is demoed: the user can navigate examples, accept or reject them, and add notes entirely via keyboard shortcuts mapped to the D-pad. At ~04:02, the workflow is shown live on the Steam Deck — the D-pad drives annotation labels without a mouse. At ~05:03, the presenter notes the Steam Deck runs Arch Linux, meaning tools like Tailscale or SSH could allow Marimo to run natively on the device, though this was not demoed. The Wiggly Stuff library's Steam Deck annotation example is highlighted as a starting point.

Tools: Marimo, Wiggly Stuff (widget library), Steam Deck, Chrome, Tailscale, Mo Lab, Jupyter, VS Code

AI Tools

marimo

Better Than Opencode

marimo showcases a small but delightful customization edge it has over Opencode — like animating a marimo moss ball — highlighting the value of tweakable details in developer tools. ^{[27]marimo — Better Than Opencode}

marimo showcases a small but delightful customization edge it has over Opencode — like animating a marimo moss ball — highlighting the value of tweakable details in developer tools.

Industry

Why Money Makes Predictions More Accurate | The Kalshi Story

Kalshi co-founder argues that financial incentives are essential for accurate forecasting: when people put real money behind predictions, they think more carefully and honestly, which reduces partisan bias and surfaces better information than opinion polls or academic experiments. ^{[28]EO — Why Money Makes Predictions More Accurate | The Kalshi Story}

Hot Take

Sequoia Capital

AI's biggest unlock is democratizing creativity, not just intelligence

Sequoia argues that tools like Claude Code matter most because they make building things fun and personally rewarding — not because they scale to millions of users. The real prediction: in 10–20 years, AI will spark an explosion of creative and entertainment projects because it finally lets anyone be creative in any domain. ^{[29]Sequoia Capital — Claude Code, Suno, and cooking all have one thing in common}

Hot Take

Dwarkesh Patel

Human population genetics researcher David Reich argues all human groups share the same genetic toolkit — even lineages diverged 200,000 years ago show no evidence of missing key mutations — while recent work reveals trait means have shifted measurably over the last 5,000–18,000 years in response to environment and selection pressure.

^{[30]Dwarkesh Patel — Humans Share the Same Genetic Toolkit - David Reich}

Blog Anthropic acquires Stainless — Anthropic, May 18, 2026
Blog OpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments — OpenAI, May 18, 2026
Blog Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment — Import AI, May 18, 2026
Blog SpaceX speeds up its IPO — Tech Brew, May 18, 2026
Blog Details of 17 Tesla Robotaxi crashes revealed — Sherwood Snacks, May 18, 2026
Blog Monday Statistics: Why Averages Can Mislead — Data Science Weekly, May 18, 2026
YouTube Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic — AI Engineer, May 18, 2026
YouTube Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind — AI Engineer, May 18, 2026
YouTube Rewiring the State — Eoin Mulgrew, 10 Downing Street — AI Engineer, May 18, 2026
YouTube The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion — Latent Space, May 18, 2026
YouTube Full AI Prompting Course with Andrew Ng — DeepLearningAI, May 18, 2026
YouTube Vanguard: The communist capitalist who saved investors a trillion dollars — Acquired, May 18, 2026
YouTube What Google Needs to Do at I/O This Week — The AI Daily Brief: Artificial Intelligence News, May 18, 2026
YouTube Hermes Agent 4.0 — AICodeKing, May 18, 2026
YouTube MiniCPM-V 4.6: The Agent Vision Model — Sam Witteveen, May 18, 2026
YouTube How to Use Your Claude Code Projects in Codex in 5 Mins — Nate Herk | AI Automation, May 18, 2026
YouTube The Prove-It Economy is Here | And Most Marketers Aren't Ready — AI News & Strategy Daily | Nate B Jones, May 18, 2026
YouTube How teams accidentally sabotage AI adoption — AI News & Strategy Daily | Nate B Jones, May 18, 2026
YouTube The problem with AI agents.. — Low Level, May 18, 2026
YouTube Claude Wrote 50 Features. None of Them Worked Together — Better Stack, May 18, 2026
YouTube Why Anthropic Dropped Markdown for HTML — Better Stack, May 18, 2026
YouTube Claude Just Recovered $400,000 in Lost Crypto — Better Stack, May 18, 2026
YouTube The Agentic Loop — Arjay McCandless, May 18, 2026
YouTube No more write code by hand. Write spec — DeepLearningAI, May 18, 2026
YouTube Debug Slow HTTP Requests in Python With httptap — Real Python, May 18, 2026
YouTube The Steam Deck is The Best Way to Annotate Data — marimo, May 18, 2026
YouTube Better Than Opencode — marimo, May 18, 2026
YouTube Why Money Makes Predictions More Accurate | The Kalshi Story — EO, May 18, 2026
YouTube Claude Code, Suno, and cooking all have one thing in common — Sequoia Capital, May 18, 2026
YouTube Humans Share the Same Genetic Toolkit - David Reich — Dwarkesh Patel, May 18, 2026

← News Feed