The Pope picks a side in the AI debate

AI FutureIndustry

The Pope picks a side in the AI debate

Pope Leo XIV released his first encyclical, Magnifica Humanitas: On Safeguarding the Human Person in the Time of Artificial Intelligence — the first pope to make AI the defining social question of his papacy.^{[1]The AI Daily Brief — What the Pope Actually Said About AI} Its central argument: human worth can't be reduced to an intelligence benchmark, and even a "smarter" AI remains categorically different because it "does not feel joy or pain" and learns only by statistical adaptation. The launch event drew Anthropic co-founder Chris Olah, whose Milan office opened the same day.^{[2]Anthropic — Anthropic opens Milan office to support Italian enterprise, research, and developers}

The AI Daily Brief frames the encyclical as a teaching letter modeled on Leo XIII's 1891 Rerum Novarum (which addressed workers' rights in the industrial revolution) — a parallel the American-born Leo XIV chose deliberately ~10:04. The thrust isn't "AI good or bad" but positioning real living humans, not artificial intelligences or anonymous market forces, as the barometer of success ~15:06.

The most-quoted passage is paragraph 99: AI systems "merely imitate certain functions of human intelligence... do not undergo experiences, do not possess a body, do not feel joy or pain... and do not know from within what love, work, friendship, or responsibility mean" ~18:10. On economics, paragraph cited: "the pursuit of greater profits cannot justify choices that systematically sacrifice jobs because the human person is an end, not a means." On data colonialism, paragraph 178 warns that those who control "the health data of entire peoples... possess a structural leverage over the future."

Everything that appears as a limit, incapacity, illness, old age, suffering, vulnerability, tends to be seen primarily as a defect to be corrected rather than as a reality through which our humanity matures.

The reaction was split: Timnit Gebru criticized Anthropic's invitation to the ceremony, Blake Scholl called the jobs take "bad," and Dean Ball complained the encyclical's "axiomatic denial of AI cognition is a punt of the highest order" — a critique others mocked as expecting the pope to believe a machine can be ensouled ~12:04. The host's verdict: most takes are bad, the document is "flag planting" for future debate, and the mere fact that the Church dedicated a first encyclical to AI signals the urgency he hopes governments emulate ~21:13.

Also in the episode: Anthropic's Project Glasswing / Mythos rollout (10,000+ high/critical vulnerabilities found, Mozilla patched 271) and a reported secret $9B US intelligence inference cluster on Nvidia Blackwell.

Industry

Simon Willison Theo - t3.gg

Have the AI labs actually found product-market fit?

Simon Willison argues the real product-market fit for Anthropic and OpenAI isn't ChatGPT — it's enterprise coding agents.^{[3]Simon Willison — I think Anthropic and OpenAI have found product-market fit} ChatGPT hit 900M weekly users but only 5.6% pay; enterprise coding customers now pay $200+/month per seat, and Anthropic's estimated Q2 2026 revenue reached $10.9B (up from $4B in August 2025). Theo independently flags the flip side: a Claude billing change that punishes anyone coding outside the official CLI.^{[4]Theo - t3.gg — How I code with AI changed a lot}

Willison shows the economics with his own usage: roughly $1,200/month of Claude Code tokens plus $980/month of Codex tokens, while paying only $200/month combined — a ~10x subsidy. Both labs re-priced enterprise plans in April 2026 (Anthropic moved to "$20/seat/month plus API pricing"; OpenAI aligned Codex with API token costs), and newer models cost more per token: GPT-5.5 is 2x GPT-5.4, Opus 4.7 ~1.4x Opus 4.6. A SpaceX S-1 reportedly revealed a commitment to pay Anthropic "$1.25 billion per month through May 2029" for compute. Willison pushes back on "AI cost alarm" stories (Uber's overrun, Microsoft seat cancellations) as predictable growing pains, and names the upcoming Anthropic and OpenAI IPO filings as the real validation test.

Theo's gripe is concrete ~9:06: Claude Code CLI/desktop users get up to $5,000 of usage for $200/month, but use Claude via Conductor, T3 Code, or claude -p over SSH and you're capped at only $200 before paying API rates directly — which he calls user-hostile.

If you use Claude Code via the CLI or desktop app, you can get up to $5,000 of usage for 200 bucks a month. But if you use it in something like Conductor or T3 Code... then you only get $200 of usage. — Theo

Industry

Tech Brew Simon Willison

Uber torched its entire AI budget by April

Uber deployed Claude Code company-wide in December and ran internal leaderboards pushing engineers to "tokenmax" — then burned through its full annual AI budget by April, a "head-exploding moment" for president Andrew Macdonald.^{[5]Tech Brew — Uber's tokenmaxxing reality check} Even with 95% of engineers using AI monthly and ~70% of committed code AI-generated, Macdonald admits "the link is not there yet" between Claude Code usage and shipping consumer features.

Tech Brew frames it as the central tension of 2026: high adoption and code-generation metrics don't automatically convert into faster shipping or user-visible features. Raw token consumption — juiced by competitive internal leaderboards — can inflate cost without proportional output. The backdrop is brutal: GPU rental prices have more than doubled since January 2026, and both OpenAI and Anthropic launched premium tiers partly to offset rising compute.

The link is not there yet between extensive Claude Code usage and actual consumer-facing features.

Worth reading alongside Simon Willison, who cites the same Uber overrun as a growing pain rather than product rejection ^{[3]Simon Willison — I think Anthropic and OpenAI have found product-market fit} — the two pieces are a near-perfect bull/bear pairing on enterprise AI ROI.

Hot Take

Lenny's Podcast

The AI “jobpocalypse” isn’t real

A guest on Lenny's Podcast pushes back on mass-unemployment narratives: visible layoffs are as much about over-hiring and slowing growth as AI.^{[6]Lenny's Podcast — The AI jobpocalypse isn't real} What models actually do is make "yesterday's human competence" cheap and universally deployable — flooding the market with commoditized output and pushing humans to build something genuinely new on top.

The argument: when everyone draws from the same models, default outputs look identical — homogenized "slop" that loses value precisely because it's everywhere. The value shifts upward, to people who use the cheap frozen competence as a substrate for differentiated work, rather than disappearing. It's the same thesis Dan Shipper develops at length in the Every interview below.

What a new model drop does is they make yesterday's human competence cheap... How do I use this to make something new and interesting?

AI ToolsDeveloper Tools

OpenAI

OpenAI's self-improving tax agents

OpenAI and Thrive Holdings built "Tax AI" for 30+ Crete accounting firms, using a Codex-driven feedback loop that lifted field-completion accuracy from 25% to 86% in six weeks across 7,000 returns.^{[7]OpenAI — Building self-improving tax agents with Codex} One senior accountant cut her annual tax-prep time from 180 hours to 15.

The self-improvement loop rests on three pillars: structured practitioner feedback captured as field-level review rows; production traces recording the full path from source docs through extraction, mapping, and filing; and a Codex iteration loop that converts repeated failure patterns into scoped engineering tasks with targeted evals. Codex inspects the trace, repo, and skills together, proposes fixes, reruns targeted and regression evals, and surfaces a candidate PR — without mutating production evidence; ambiguous cases route back to engineers.

The reusable pattern: a bounded task environment separating a writable worktree (the scoped product surface Codex can modify) from read-only production context (traces, source documents, predictions, finalized returns). The same three-part design is now being applied to bookkeeping, audit, and IT help desk. Building on OpenAI's published harness-engineering work and the Symphony orchestration framework, the Schedule E (rental property) work took ~six weeks but produced reusable abstractions that accelerated Schedule C and A.

For Codex, the result is not a vague alert but a scoped engineering task with evidence, editable product surfaces, and explicit validation gates.

Industry

Anthropic

Anthropic opens its sixth European office in Milan

Anthropic opened a Milan office — its sixth in Europe after London, Dublin, Paris, Zurich, and Munich — already working with Generali, Enel, Pirelli, Angelini Pharma, and Bracco.^{[2]Anthropic — Anthropic opens Milan office to support Italian enterprise, research, and developers} Standout deployments: JAKALA rolled Claude to 3,000+ seats and freed ~70% of senior team time; Satispay compressed an 18-month roadmap to seven.

Thomas Remy leads Southern Europe (Chris Ciauri as MD International), with a stated mission to support Italian enterprise, research, and culture through a "safe AI transition." Bending Spoons now co-authors the majority of its code changes with Claude Code. The launch coincided with Pope Leo XIV's AI encyclical, where co-founder Chris Olah was invited to discuss ethics; during Milan Design Week, Anthropic ran workshops with Alcova Milano on Claude in design tools.

Industry

Sherwood Snacks Last Week in AI

AI mints a 12th trillion-dollar company — and a bubble warning

Micron became the 12th US company past $1T market cap, driven entirely by AI memory demand after UBS tripled its price target from $535 to $1,625.^{[8]Sherwood Snacks — US now has a dozen $1 trillion companies} But a Last Week in AI clip warns the bull run rests on financial engineering: banks are repurposing European "Significant Risk Transfers" to skirt post-2008 concentration limits and keep funding AI.^{[9]Last Week in AI — The Bull Run That Never Ends}

Micron shares jumped 19% on the upgrade and are up 800% over the year; analyst Timothy Arcuri argues "a large chunk of demand is already locked down." The trillion-dollar club now spans the full semiconductor stack — Qualcomm surged on a reported ByteDance chip deal, and Nvidia tied up with nuclear startup Oklo, a sign frontier compute demand is now driving power-generation investment ^{[8]Sherwood Snacks — US now has a dozen $1 trillion companies}.

The bear case ~0:00: the clip likens today's cycle to the pre-2008 mortgage bubble in The Big Short. Banks are hitting concentration limits "designed to prevent exactly this kind of concentrated exposure," and reaching for SRTs (a European instrument built for diversified portfolios) as a workaround. The whole thesis hinges on whether recursive self-improvement and superintelligence actually arrive.

If recursive self-improvement works, if superintelligence happens, this will be the bull run that never ends.

Developer ToolsHot Take

Simon Willison Simon Willison

Open source pushes back on AI slop

SQLite added an AGENTS.md that formally bans agentic code contributions — recently strengthened by deleting a "(currently)" qualifier — while welcoming AI bug reports only with reproducible test cases.^{[10]Simon Willison — sqlite AGENTS.md} The mood is echoed in a Real Python story about a bot that mass-submitted PRs renaming .rst files to .rs (mistaking reStructuredText for Rust) and a Simon Willison Star Trek gag about agents that acknowledge orders but never execute them.^{[11]Simon Willison — Quoting Kyle Ferrana}

SQLite will review well-written proof-of-concept submissions before reimplementing them itself, and routes the influx of inconsistent AI bug reports to a separate Bug Forum where D. Richard Hipp resolves issues directly. The drive-by-PR story (a maintainer spending five minutes to close a context-free rename that made no sense for a library containing zero Rust) captures the hidden tax low-effort AI contributions impose at scale. And Willison's Kyle Ferrana quote — a TNG parody where Data fails to raise shields despite being ordered to, causing hull breaches — is a pointed metaphor for the gap between an agent acknowledging a task and actually doing it ^{[11]Simon Willison — Quoting Kyle Ferrana}.

SQLite does not accept agentic code.

AI ModelsIndustry

Anthropic

Coding agents arrive in academic social science

An Anthropic survey of 1,260 quantitative social scientists found 81% have tried AI chatbots but only 20% regularly use coding agents — and among those, Claude Code dominates at 86%.^{[12]Anthropic — Coding agents in the social sciences} Economists lead adoption at 39%; education trails at 4%.

Authors Thomas Lyttelton, Maxim Massenkoff, and Nathan Wilmers ran the baseline in February–March 2026. Adoption is starkly uneven: political scientists 25%, public health and communications 6%, education 4%; researchers with typically male names adopt at 2x the rate of female-named peers, and top-25 universities show 40% higher adoption. Coding-agent users started ~25% more projects and posted ~50% more working papers over six months — but no rise in journal submissions yet, suggesting a pipeline lag rather than a ceiling. The framing: agents that "take a research idea and a dataset, write and run an analysis, interpret the output, and iterate autonomously."

Podcast

The Pragmatic Engineer

OpenCode's Dax Raad: 8M users, and AI still hasn't made us faster

Dax Raad, co-founder of OpenCode — now the most popular open-source coding harness at ~6.5–8M monthly actives — argues that despite building and living in an AI coding tool, he's "thinking as hard as I ever have."^{[13]The Pragmatic Engineer — Building OpenCode with Dax Raad} He's blunt about hype, inference margins, the GPU bottleneck, and why none of his competitors have a productivity edge.

OpenCode grew from 650K monthly actives in December to 2.5M in January to ~6.5–8M by recording, next milestone 10M ~19:08. Raad reframes AI's gain as moving from 95% thinking / 5% doing to 96% / 4% — "a 20% improvement, but day-to-day it feels as hard as ever" ~45:21. The GTM was deliberate: claim the "open source" position (the Linux of coding agents) and "pick one temporary bad guy and galvanize all their competitors" — when Anthropic abruptly cut Claude subscriptions in OpenCode at 9pm one Thursday, Raad messaged OpenAI overnight and had official support by end of day ~21:08.

On economics: OpenCode Zen (inference aggregation) hit a $50M run rate in 5–6 months; he sees ~80% margins as a middleman and guesses the labs run ~90%, though indefensible long-term — and even OpenCode is GPU-bottlenecked as hyperscalers "vacuum up" supply ~36:15. His memo to the team admitted AI turbocharged three problems: shipping features not worth shipping, silently absorbing the agent's hacks (killing the "muted prickle" of judgment), and not cleaning up.

Jump to:

~03:01 — Thesis: easier coding, but thinking as hard as ever
~16:07 — 650K → ~8M user growth
~21:08 — The Anthropic ban and the “galvanize competitors” GTM
~28:13 — Dev tools are B2C; harness quality came second
~32:13 — OpenCode Zen ($50M run rate) + enterprise control plane
~36:15 — ~80–90% inference margins and the GPU bottleneck
~40:18 — Hot takes: productivity hype, CFO LLM costs, the “24–29” BS
~48:23 — The team memo: hacks, the “muted prickle,” cleaning up
~66:30 — Taste, craft, guardrails for agent “idiots,” career advice

None of our competitors are crushing us either... All our competitors are super into AI. So you would think in our space there would be a huge gap, but there just isn't.

There is a world where the net result of all these AI coding tools is the same amount of work gets done, but all the engineers are happier... That's not good enough for a lot of companies.

Podcast

Every

Every's “After Automation”: more AI, more humans

Every COO Brandon interviews founder Dan Shipper on his essay After Automation: despite being as agent-native as a company gets — "if you swing a stick around in our Slack, you're as likely to hit a human as you are an agent" — Every grew from 4 to ~30 people while automating everything.^{[14]Every — We Automated Everything With AI and Tripled Our Headcount} The mechanism: AI makes "yesterday's expert competence cheap," which raises demand for experts to build guardrails and tackle the newly possible.

Default prompting floods the zone with work that's "close but not quite right" ~7:02, devaluing that output category while sharply increasing demand for experts who (a) build systems that shepherd slop into useful work (repo rules, PR review guidelines) and (b) use the higher floor to build the previously-impossible. The durable gap is agency vs. autonomy: agents are great at autonomous task execution but lack self-motivated agency, and there's no economic incentive to build an agent that refuses to work ~19:08. Shipper's definition of AGI: "any agent that you never turn off" ~20:10.

On layoffs, both push back on the ClickUp CEO bragging about cutting ~22% of staff, plus Meta and Square — arguing these reflect bloat or failing businesses that then blame AI ~28:15. Customer-service automation often backfires (fire the reps, beg them back two months later).

Jump to:

~02:01 — The paradox: 4 → 30 people while automating everything
~06:02 — AI makes yesterday’s expert competence cheap
~09:03 — Flooding the zone; demand for experts and systems
~11:03 — Agents looking back at humans; Zeno’s paradox
~15:06 — Agency vs. autonomy; the agent you never turn off
~22:13 — ClickUp layoffs, service backfires, blaming AI
~33:18 — “Ride the models” and Dan’s AI writing workflow

AI makes yesterday's expert competence cheap.

Podcast

Y Combinator

Inside YC's AI playbook

YC partners and GP Pete Koomen walk through the agent infrastructure YC built internally — and make a strong case that startups should leapfrog incumbents by going fully AI-native now.^{[15]Y Combinator — Inside YC's AI Playbook} The unlock that started it all: giving agents read-only SQL access to YC's production database, which one partner "surreptitiously pushed out late at night.”

It began as a self-serve agent harness for YC's finance team, replacing engineer-written Ruby ~2:01. The structural advantage: YC runs on its own software with all context in one Postgres schema, so agents answer arbitrary cross-team questions — triggering a Jevons effect where people dared to ask far more ~8:04. Primitives: a shared tool registry that grew from ~20 to 350+ tools, skills as an abstraction over tools, and nightly self-improving "dream cycle" loops where a general agent reads all employee agent conversations to find missing context ~19:14.

The strongest takes ~24:21: start a startup because incumbents lock context down "for safety"; be willing to spend $10K–$100K/year on tokens to "live in 2028" and leapfrog every Fortune 500 via a one-time time-warp (the cost drops to a few hundred dollars within two years). The close is Koomen's "Horseless Carriages" thesis — AI software should be agents wrapping deterministic tools, not the reverse — and Gary Tan's macro fork ~40:36: a 1984-vs-personal-computer moment over the next 18–24 months between centralized "five kings" control and a homebrew personal-AI revolution.

Jump to:

~02:01 — Building YC’s agent harness for finance
~06:04 — The unlock: read-only SQL on the production DB
~10:06 — Advice for legacy orgs: denormalize into a “big table”
~14:09 — Shared context, 350+ tool registry, skills as resolvers
~19:14 — Self-improving dream cycles; the two-sentence-description skill
~24:21 — Why founders should start startups; spend to “live in 2028”
~32:28 — Horseless Carriages: agents wrapping tools
~40:36 — The 1984 fork: centralized control vs. personal AI

What if we just gave the thing complete access to the production database... I surreptitiously pushed it out late at night. And it worked.

Podcast

Max Agency

Cogent's three-agent architecture for autonomous cyber defense

Cogent CTO Geng Sng explains why autonomous defense is suddenly necessary — the mean time to exploit has collapsed from ~2.5 years to minutes — and walks through Cogent's "Agent Lake," three agent types, an autonomy trust ladder, and a fleet of LLM-as-judge evals.^{[16]Max Agency — Inside Cogent's three-agent architecture for autonomous defense — Geng Sng}

With frontier models finding novel zero-days in battle-tested code (Anthropic's model needed ~500 turns on Mozilla), the hard problem isn't finding bugs but fixing them at machine speed, which needs organizational context and judgment ~1:01. Cogent's "Agent Lake" stores billions of lifecycle events/day on S3 (not a graph DB, which can't handle the write throughput) and materializes custom graphs on demand ~5:03. Three agent types — interactive (low-latency), background (long-running, retryable), and internal coding agents — operate inside deep sandbox isolation where a policy engine elevates access deterministically, so a hallucinating agent literally can't take a write action ~18:14.

Customers climb a trust ladder: read-only with an approve button → auto-routing → auto-validation in sandboxes → auto-remediation; the real blocker is knowing production impact ("banks will never let agents touch a payment processor") ~12:11. Sng's closing thesis: defensive security may be "one of the last bastions" to resist commoditization because every enterprise's "castle" is built differently, making the data extremely hard to model ~72:46.

Jump to:

~01:01 — Why now: mean time to exploit collapsed to minutes
~05:03 — Agent Lake: agentic data lake on S3
~12:11 — Autofixing and the autonomy trust ladder
~16:13 — Three agent types: interactive, background, coding
~18:14 — Deep sandbox isolation + policy engine
~36:28 — Two-tier evals and LLM/agent-as-judge
~41:30 — Hot vs cold context and the knowledge graph
~50:34 — Cogent Research: formal verification, RL, benchmarks
~72:46 — Why defensive security resists commoditization

This is basically SOC 2.0. Attackers only need to be right once and defenders right all the time. And today the mean time to exploit is in minutes.

Podcast

Latent Space

The bitter lesson comes for proteins — Alex Rives

Alex Rives (head of science at BioHub, formerly EvolutionaryScale / Meta FAIR) calls himself "the most bitter-lesson person in protein biology" and lays out why scaling laws apply to biology.^{[17]Latent Space — The Bitter Lesson is Coming for Proteins — Alex Rives, BioHub} The newly MIT-open-sourced fourth-gen ESM "Cambrian" world model is the centerpiece, built on the world's largest protein database — 6.8B non-redundant sequences, ~1.1B resolved structures.

The thesis: evolution generated billions of protein sequences whose statistical patterns (co-varying residues) encode the underlying biology; a masked LM forced to predict the amino acids evolution chose must learn the latent constraints on structure and function ~4:02. The key ESMC breakthrough over ESM2 (same parameters, more compute) was data: adding billions of metagenomic sequences removed ESM2's diminishing returns, yielding a clean scaling law — proving ESM2 was data-limited, not compute-limited ~18:09. Sparse autoencoders revealed an emergent hierarchical feature space mirroring a century of reductive biology, including a single feature for the "nucleophilic elbow" motif.

Unlike AlphaFold's heavy priors and reliance on MSAs, ESM is a vanilla transformer learning structure from data alone — and because it needs no MSAs, it does much better on antibodies, where evolution favors diversity ~25:13. Inverting the model as a searchable world model has produced mini binders and antibodies/scFvs at therapeutic-level affinity in early trials. BioHub (a philanthropy) committed $400M to an internal Virtual Biology Initiative plus $100M to outside efforts.

Jump to:

~01:01 — The bitter-lesson thesis: scaling since 2018
~04:02 — Why evolution’s sequence patterns encode biology
~06:03 — ESM Cambrian and the 6.8B-protein atlas
~10:04 — SAEs reveal an emergent hierarchical feature space
~18:09 — ESMC: metagenomic data removes diminishing returns
~25:13 — Vanilla transformer vs AlphaFold; no MSAs needed
~28:19 — Designing antibodies by searching the world model
~38:28 — The new paradigm: data generation, prediction, feedback
~57:42 — BioHub’s $400M+$100M Virtual Biology Initiative
~65:46 — How much protein data is left

There are no longer diminishing returns to scale. That's really saying that ESM2 was data limited rather than compute limited.

Podcast

Matt Williams

Matt & Ryan: local AI, Hermes vs OpenClaw, and the cost of always-on agents

A wide-ranging chat covering OpenAI's Elixir-based Symphony orchestration, a hands-on Hermes-vs-OpenClaw comparison, the token economics of scheduled agents, and a string of curiosities from AI "Bixonmania" hallucinations to reverse-engineering siren formats.^{[18]Matt Williams — matt and ryan have a chat}

Both hosts prefer Hermes (Nous Research) over OpenClaw ~17:09: sane minimal defaults, auto-created skills, and — crucially — reproducible installs, where OpenClaw varied across "two-to-three dozen wipes." On economics ~22:12: scheduled/heartbeat tasks (every 5–30 min) burn tokens fast because each run re-invokes the model; the advice is to prove a task with an agent, then move repeatable work to n8n/Make/Zapier. Kimi K2.6 is called the best value right now, with newer models too output-token-hungry ~24:14.

On Symphony ~7:01: its build "landed on Elixir" because fail-forward primitives fit agents better than GC-style approaches, though it requires Linear and GitHub work trees. The bigger idea: agent orchestration's real consumer is agents, not humans — maybe the Kanban UI is for the viewer, not the system.

Jump to:

~07:01 — OpenAI’s Symphony and why it landed on Elixir
~11:05 — AGI skepticism — can models “think”?
~17:09 — Hermes vs OpenClaw: reproducible installs, auto-skills
~22:12 — Token economics of scheduled agents
~24:14 — Best value: Kimi K2.6; Ollama Cloud limits
~33:20 — Bixonmania and AI-villain training fiction
~46:27 — Reverse-engineering siren formats with GenAI

I could not make a repeatable video [for OpenClaw] because the experience was different every single time... with Hermes, it was the same every single time.

PodcastDeveloper Tools

AI Engineer

Comprehend first, code later — Priscila Andre de Oliveira at AI Engineer

Sentry engineer Priscila Andre de Oliveira analyzed 116 of her own Claude sessions and found 67% were comprehension prompts and only 2% code generation — and built a "catch me up" skill around it.^{[19]AI Engineer — Comprehend First, Code Later — Priscila Andre de Oliveira, Sentry} Her thesis: in a 15-year codebase with 100 PRs merged daily, the biggest AI unlock isn't generation, it's comprehension.

Since December 2025 she's stopped writing code and only prompts ~2:09. Sentry's codebase — 15+ years, ~100 PRs/day, 100k orgs depending on it — demands constant re-orientation ~3:10. Her "catch me up" skill is a structured markdown prompt with six exploration modes (architecture, convention, feature trace, syntax, testing, history), demoed live against an unfamiliar repo ~10:20. She cites Jack Naish and Armin Ronacher on the risks of vibe-coding without understanding.

Jump to:

~02:09 — Since Dec 2025: only prompting, no coding
~03:10 — Sentry’s codebase: 15+ years, 100 PRs/day
~04:12 — Internal AI tools: Abacus, Warden, Junior
~09:18 — 67% comprehension, 2% code generation
~10:20 — The “catch me up” skill: six exploration modes
~12:22 — Live demo on an unfamiliar repo
~14:22 — Comprehend before you prompt

The biggest unlock from AI in a large codebase isn't generation. It's comprehension.

PodcastDeveloper Tools

AI Engineer

Why Rust is the ideal language for vibe-coding — Daniel Szoke at AI Engineer

Sentry's Rust SDK maintainer makes a contrarian case: Rust's strict compiler makes it better for AI-generated code — not despite being hard for LLMs to write, but because of it.^{[20]AI Engineer — Why Rust is the Ideal Language for Vibe-Coding — Daniel Szoke, Sentry} Every compile error is a bug caught before production.

Conventional wisdom says Python and TypeScript win at vibe-coding because they're easy for LLMs to produce runnable code on the first try ~2:15. Szoke challenges whether ease-of-generation is a virtue: LLMs are non-deterministic and fallible, tests can't prove correctness, and LLM-written tests share the code's failure modes ~5:18. Borrowing Harari's "alien intelligence" framing, he argues the failure modes can be wholly unexpected — subtle bugs behind clean-looking code. Rust's compiler is a deterministic guardrail, and the agent loop (write → compile → read error → fix) suits compiler output perfectly; a data race that silently corrupts results in TypeScript is a hard compile error in Rust ~12:31.

Jump to:

~02:15 — Why dynamic languages feel easy for LLMs
~03:16 — The hidden cost: flexibility makes mistakes easy
~05:18 — Limits of tests and review agents
~07:23 — LLMs as alien intelligence
~09:27 — Rust’s compiler as deterministic guardrail
~12:31 — Fearless concurrency: data race caught at compile time
~14:35 — Agent loop + compiler errors = bugs caught early

Rust is harder for LLMs to get right on the first try because there's so many rules they need to follow. But I think this is a good thing.

PodcastDeveloper Tools

AI Engineer

The maturity phases of running evals — Phil Hetzel at AI Engineer

Braintrust's Phil Hetzel maps the four phases teams move through building agent evals — from documented vibe-checks to LLM-as-judge to full-trace evals to automated CI pipelines — with a sharp warning not to trust an LLM judge just because you "put a robe and a cloak on it."^{[21]AI Engineer — The maturity phases of running evals — Phil Hetzel, Braintrust}

Evals aren't unit tests: focus on the specific failure modes your subject-matter experts know, not exhaustive coverage ~5:16. The three primitives are a task, a dataset, and scoring functions. Phase 1 accepts vibe-checking but insists annotators document thumbs-up/down with written justifications; Phase 2 turns those into failure modes scaled via LLM-as-judge (which must itself be validated against human ground truth) and deterministic checks; Phase 3 handles agents calling external systems by evaluating full traces and snapshotting system state; Phase 4 points to automated topic modeling and CI-style eval pipelines driven by Claude Code plus a provider CLI ~16:24.

Jump to:

~03:15 — Why evals matter: risk, brand, offense
~05:16 — Primitives: task, dataset, scoring functions
~07:17 — Phase 1: documented vibe-checking
~10:20 — Phase 2: LLM-as-judge and the production flywheel
~13:22 — Phase 3: full-trace evals and external state
~16:24 — Phase 4: topic modeling and automated CI evals

Just because you put a robe and a cloak on an LLM, that doesn't make it inherently more trustworthy.

AI ToolsDeveloper Tools

Theo - t3.gg

Theo: how I code with AI changed a lot

Five months after his last workflow video, Theo has thrown it all out: he's "pretty much entirely stopped using Claude models" and now runs GPT-5.5 on the Codex harness almost exclusively, building his Lakebed project across 100+ one-at-a-time threads.^{[4]Theo - t3.gg — How I code with AI changed a lot}

Models & harness ~3:21: he struggled to use anything but GPT-5.5, only pulling Claude in for a quick landing page; effectively unlimited 5.5 inference on the $200 plan (worst weekly usage ~6% while building a whole cloud). Desktop & remote ~6:04: a good desktop app beats CLIs/SSH; he prefers T3 Code but concedes the Codex app is best for most, and calls SSH+tmux+handwritten git-worktree setups "mental illness." Codex's mobile control is genuinely good; the Codex desktop remote was "insultingly bad."

Context & prompts ~18:13: read what the model says, not just its code; keep prompts to two sentences; write agents.md by hand "almost like a letter from me to the agent"; he has "almost zero skills installed." Sequential beats parallel ~30:21: of 100+ threads in 5 days, nearly all ran solo on main — parallel work-trees are "too much context to keep track of." One two-sentence prompt speced a full feature in 1m54s; "Love it. Build it." shipped it 10 minutes later.

You're all coping. You don't need all of that. I have almost zero skills installed. Just talk to the model. They're smart enough now.

Tools: GPT-5.5, Codex (CLI/app), T3 Code, Cursor, Claude Code, OpenCode, Whisper Flow, Tailscale, Helium, Code Rabbit

AI ToolsProductivity

Nate B Jones

Nate B Jones: red-team your own AI documents

Nate Jones's core technique: never trust an AI-built deck or spreadsheet because it opens cleanly — run a second model as a hostile reviewer.^{[22]Nate B Jones — I Built a Deck With AI, Then Made a Second AI Attack It.} He plays Codex (builder) against Claude Opus 4.7 (skeptical critic) in an autonomous "Ralph loop," and reframes knowledge work as code with agents at the center.

The verification prompt ~13:05: "Read this deck or workbook as a skeptical reviewer who suspects every claim and every number... Produce a written list of every issue found. Don't fix anything, just enumerate." That last instruction is what makes it work — flipping from generation to enumeration lets a model catch mistakes it would otherwise miss. The full loop: Codex builds → Opus 4.7 reviews and lists edits → pipe back to Codex → return to the same Opus thread to verify, repeat.

His four-stage workflow ~3:01 replaces one-shot prompting: (1) prepare sources with owner/date/type/status, (2) write a file spec before any slide or formula, (3) build constrained by the source packet, (4) verify with the hostile reviewer. Real failure modes that survive a quick look: an Excel "financial model in a costume" with a wrongly-copied growth formula, and a board deck that silently blended Q3 actuals with Q4 plan. His hot take ~0:00: no vendor can ship a generic push-button harness because deep knowledge work is irreducibly domain-specific — "reality has a surprising amount of detail."

A prompt asks for an output. A workflow defines the stages the output has to pass through before it can be trusted.

AI ToolsProductivity

Nate B Jones Nate B Jones Prefect Artem Zhutov

Power-user playbook: steer Claude, run agents in parallel, persist context

A cluster of practical workflow tips. Nate B Jones: Claude externalizes its reasoning as it writes, so you can interrupt and redirect mid-stream^{[23]Nate B Jones — Why you're using Claude completely wrong} — and front-loading rich context turns it into a thinking partner that reframes your problem rather than just executing it.^{[24]Nate B Jones — The mistake everyone makes switching to Claude} Prefect's Jeremiah Lowin runs five Claudes at once across five repo copies^{[25]Prefect — Jeremiah Lowin runs 5 Claudes at once. Here's the system behind it.}; Artem Zhutov shares a three-part system to stop re-explaining your project every session.^{[26]Artem Zhutov — Stop Re-Explaining Your Project to Claude Code}

Steering Claude: watch the streaming chain-of-thought and hit stop the moment it drifts ~0:00. The bigger "switching to Claude" mistake is treating context as optional polish — ChatGPT uses context to detail what you asked for, Claude uses it to interrogate the frame: "If you give it a thin situation, you're going to get thin thinking."

Parallel agents ~0:00: Lowin keeps five terminal tabs running five Claudes in five copies of the same repo, each on a different branch, reconciled later via PRs — "my productivity exploded because I am now running at every moment." His caveat: the visible wins sit atop "hundreds and millions of lines of code that no one's using."

Persistent context ~1:01: a markdown session file (goal, context, definition of done, progress log), an Obsidian dashboard with live Dataview queries for "dynamic memory" (vs static CLAUDE.md), and a custom handoff command that runs a retrospective, writes a handoff file, and programmatically launches the next session — dropping context use from ~200k tokens back to ~4% ~9:05.

If you give it a thin situation, you're going to get thin thinking. If you give it a really rich context layer, you're going to get strategic reasoning. — Nate B Jones

Developer Tools

Github Awesome

35 Claude Code skills worth knowing

A rapid-fire tour of 35 trending open-source Claude Code / Codex skills — a strong signal of how much the "skills" ecosystem has matured, with a recurring obsession: beating AI slop.^{[27]Github Awesome — 35 Claude Code skills on GitHub}

Standouts ~0:00: 9arm-skills (a debugging discipline: reproduce → trace → falsify → cross-reference before touching code); book-to-skill (turn any technical PDF into a queryable skill); native-feel-skill (reverse-engineered from Raycast: 8 tenets, a 75-item ship audit for native-feeling desktop apps); Hallmark (anti-slop UI design: 22 themes, 65 slop-test gates, so two briefs produce genuinely different sites); agent skills eval (runs skills with/without baseline, grades with a judge model, emits an HTML report); Sprinto compliance skills (a PII detector that intercepts GDPR violations before code ships); and the Tufte skill (10 data-viz rules auto-applied to every chart).

The cross-cutting theme ~0:30: anti-slop. Louis Rossmann's team published 24 hard writing rules derived from 513,000 words of corpus analysis; Charlie Hill's 17-skill content operation has every skill read a voice profile first; PaperSpine enforces a structured pre-writing phase for academic papers.

You buy a 600-page technical book, read three chapters, then never open it again. Book-to-skill flips that.

Developer ToolsAI Tools

Better Stack

Deno open-sources Claw Patrol, an agent firewall

Deno released Claw Patrol, a gateway that sits between AI agents and the internet to solve three problems at the network layer: secret injection, action control, and observability.^{[28]Better Stack — Deno Just Open Sourced Their Agent Firewall (Claw Patrol)} Because enforcement happens in transit, it's theoretically immune to prompt injection.

Secrets (Postgres passwords, API keys, tokens) live only on the gateway; agents make requests with placeholder credentials that the gateway injects, so leaked logs or prompt injection can't expose them ~1:01. Action control uses Common Expression Language (CEL) rules evaluated per request — an agent's DROP TABLE gets a custom denial ("schema changes only land via migration PRs") — with optional human-in-the-loop (Slack) and LLM-as-judge escalation ~2:30. Everything routed through the gateway is logged with token counts and full request/response bodies. Config is a single HCL file with profiles for role-based access; a replay command catches rule regressions. Caveats: setup is tedious (all file-based) and the presenter hit issues intercepting bare IP addresses — it's very early-stage ~6:00.

Your agent shouldn't see secrets, and you can't see what your agent did.

Developer ToolsProductivity

Github Awesome marimo

Two smaller tools: AIPointer and marimo's agent pairing

Two quick tool drops. AIPointer brings Google's Chromebook-only "magic pointer" to every OS — hit a hotkey, it screenshots whatever your cursor points at and queries a vision model inline.^{[29]Github Awesome — AIPointer: Hold a key, ask a question, get an answer about whatever your cursor is pointing at} And marimo added a "pair with agent" button that connects any skill-supporting agent to a running notebook.^{[30]marimo — Yes. Every Agent Now.}

AIPointer ~0:00 shows a glassmorphism overlay next to the cursor, captures a screenshot, and answers in any of seven languages without tab-switching — BYOK across Anthropic, OpenAI, and Gemini. marimo's update ~0:00 ships presets for Claude, Codex, and Open Code, plus a copyable prompt so Gemini, Conductor, or any other skill-capable agent can attach to a live notebook — positioning marimo as agent-agnostic.

AI ModelsAI Tools

AICodeKing

Xiaomi's Mimo V2.5 Pro slashes API prices up to 99%

Xiaomi permanently cut Mimo V2.5 API pricing by up to 99% and expanded coding token plans — the entry tier is now 4.1B credits for $6/month.^{[31]AICodeKing — Mimo V2.5 Pro $6 4B Token Plan (Fully Tested): This is ACTUALLY CRAZY!} The flagship Mimo V2.5 Pro (1T total / 42B active params, 1M context) is MIT-licensed, but AICodeKing's hands-on tests rate it decent-not-exceptional.

Overseas pay-as-you-go for V2.5 Pro is now $0.036 / $0.435 / $0.28 per 1M cached-input / uncached-input / output tokens ~1:03. The separate token-plan subscription (built for OpenCode, Claude Code, Kilo Code, etc.) runs $6 / $16 / $50 / $1,000 per month for 4.1B / 11B / 38B / 82B credits — but watch the conversion: one uncached input token = 300 credits and one output token = 600, so 10M fresh-context input tokens eats ~3B credits ~3:06. On visual coding ~5:06, it passed a basic elevator sim but failed design-heavy tasks (contact-lens case, folding table). Verdict: worth trying at the price, pair with a stronger model for polish.

Do not buy it purely because the credit number looks massive. Understand the credit conversion.

AI FutureAI Models

Data Science at Home

Energy-based models: why transformers are “incomplete”

A technical explainer argues LLMs are structurally incomplete: the softmax forces a next-token prediction so hallucination is architectural, and every token gets fixed compute regardless of difficulty.^{[32]Data Science at Home — Energy-Based Models Is All You Need (Why Transformers Are Incomplete)} The proposed complement is energy-based models — and a startup, Logical Intelligence, claims its EBM "Kona" solved a hard Sudoku for ~$4 vs an estimated $15,000 for a frontier LLM.

The core argument ~3:02: softmax output is always non-zero, so the model can never say "I don't know" — "you cannot make a next-token predictor refuse to predict the next token." EBMs replace P(Y|X) with an energy function E(X,Y); inference becomes optimization, descending an energy landscape to verify and refine rather than generate ~6:03. The historical obstacle is the intractable partition function Z (worked around via contrastive divergence, NCE, VICReg). The host ties it to LeCun's JEPA, itself an EBM reasoning in representation space ~13:07.

Logical Intelligence ~19:10 — founded by physicist Eve Bodnia with LeCun as founding board chair — pitches Kona as "the world's first energy-based reasoning model" with adaptive compute, targeting domains where hallucination is unacceptable (healthcare, defense, finance). The framing: EBMs are a complement, "the reasoning layer that LLMs structurally cannot be."

Hallucination isn't actually a bug... It's what the architecture is about.

AI Models

Better Stack

Odyssey's Agora 1: a generative world model that's actually multiplayer

AI startup Odyssey demoed a four-player game running entirely inside an AI model — the first generative world model to hold a consistent shared reality across players, by splitting world-state management from rendering.^{[33]Better Stack — This Multi-Player AI Game Engine Runs Without ANY Code}

Prior world models like Google's Genie are single-player; a second participant desyncs because simulation and rendering are fused into one next-frame network ~0:00. Agora 1 splits into a state model (tracking actions, movement, health for one shared world state) and a diffusion-transformer renderer conditioned on that state — streaming four mathematically consistent viewpoints with no game engine or hand-coded logic ~1:01. The demo is a GoldenEye-style deathmatch; the real goal is infrastructure for collaborative robotics and multi-agent RL, where populations of agents co-evolve in simulation before real-world deployment.

AI Models

marimo

Why bag-of-words still breaks modern embeddings

Even frontier embedding models stay largely insensitive to word order, grammar, and negation — because cosine similarity tracks token presence more than meaning.^{[34]marimo — Why Bag of Words Still Breaks Modern Embeddings} "The lion eats a man" and "the man eats a lion" share the same bag of words, and so does the broken "eats the lion man a.”

Tested in a marimo notebook across word swaps, shuffles, and negations ~2:01, models from sentence-transformers plus OpenAI text-embedding-large, Gemini, and Qwen all cluster same-bag-of-words pairs at high similarity regardless of meaning — OpenAI's heatmap nearly identical to the default model ~4:02. A subtle finding: negated sentences cluster tightly with each other regardless of opposite meaning, so the model treats negation as style, not semantic inversion. Practical guidance: fine for long-document clustering, but negation is a real production risk (a support log's "this is not the problem" vs "this is the problem") — augment with a classical negation detector or LLM classifier.

The grammatical correctness of a sentence in no way informs the shape of the embedding, but the words that are in the bag, that totally does.

IndustryDeveloper Tools

Low Level

Yellow Key: a BitLocker bypass zero-day

Security researcher Nightmare Eclipse dropped "Yellow Key," a zero-day that bypasses BitLocker — not by breaking the crypto, but by abusing the Windows 11 recovery environment's automatic NTFS transaction replay to drop into a command shell with access to the decrypted partition.^{[35]Low Level — dude wtf} Microsoft banned the researcher from both GitHub and GitLab.

A crafted USB carries malicious file-system transactions that WinRE auto-replays ~9:04, deleting winpe.ini so recovery falls back to a shell — and since the TPM has already released the BitLocker key during boot, that shell can read the encrypted disk ~8:02. It's Windows 11-specific (Microsoft re-enabled auto-FSTX). Mitigations: remove the BootExecute registry key or set a TPM PIN — though the researcher claims an unpublished variant defeats even TPM+PIN. The backstory ~0:00: a long-running dispute over Microsoft not treating admin-to-kernel as a security boundary led Nightmare Eclipse to publicly drop six+ zero-days.

This is not a vulnerability with BitLocker... it's a vulnerability in the recovery environment.

Industry

Sequoia Capital

Notion's Ivan Zhao: build a company like a religion

In a Sequoia clip, Notion CEO Ivan Zhao embraces the "cult leader" label, arguing great companies — like religions — win on a strong point of view and value system that meets human needs for meaning and belonging.^{[36]Sequoia Capital — Why Notion's Ivan Zhao likes being called a cult leader}

He holds up the Catholic Church as "one of the most successful companies of all time" — 2,000 years on consistent ritual and belief ~0:00. The interviewer riffs that Jesus was a great founder and Paul a great head of sales, with referral marketing built in. The point: tapping human nature's need for meaning is among the most durable foundations a company can have.

Industry

Dwarkesh Patel

A Neanderthal DNA mystery, solved in a Spanish pit

In a Dwarkesh Patel clip, geneticist David Reich lays out a paradox: 400,000-year-old fossils from Spain's Sima de los Huesos have Neanderthal-like nuclear DNA but Denisovan-like mitochondrial and Y-chromosome DNA.^{[37]Dwarkesh Patel — A Pit in Spain Holds the Key to a Neanderthal DNA Mystery - David Reich}

Neanderthal nuclear genomes diverge from modern humans ~700–800k years ago, yet their mtDNA diverges only ~300–450k years ago ~0:00. Reich's explanation: a small introgression (~5% of the genome) from an early modern-human-related population somehow swept its mtDNA and Y chromosome to 100% frequency — a near-impossible outcome by chance — while leaving the nuclear genome largely intact. The Sima de los Huesos fossils are the key data point.

Developer Tools

Arjay McCandless

Refresher: why cache invalidation is so hard

A clean explainer on the old chestnut: caching is trivial to add and notoriously hard to keep correct, because caches live at multiple independent layers and every write strategy carries a system-wide tradeoff.^{[38]Arjay McCandless — Cache invalidation}

The difficulty starts the moment data changes ~0:00: four common write strategies (fixed TTL, write-to-cache-first, write-to-both, write-through-bypass) each carry distinct consistency/durability tradeoffs. It compounds because caching isn't one layer — browser, CDN, server memory, external store — and third parties (including the browser) may cache without your knowledge ~1:00. Always re-checking the database to guarantee freshness defeats the entire purpose.

The Pope picks a side in the AI debate

Have the AI labs actually found product-market fit?

Uber torched its entire AI budget by April

The AI “jobpocalypse” isn’t real

OpenAI's self-improving tax agents

Anthropic opens its sixth European office in Milan

AI mints a 12th trillion-dollar company — and a bubble warning

Open source pushes back on AI slop

Coding agents arrive in academic social science

OpenCode's Dax Raad: 8M users, and AI still hasn't made us faster

Every's “After Automation”: more AI, more humans

Inside YC's AI playbook

Cogent's three-agent architecture for autonomous cyber defense

The bitter lesson comes for proteins — Alex Rives

Matt & Ryan: local AI, Hermes vs OpenClaw, and the cost of always-on agents

Comprehend first, code later — Priscila Andre de Oliveira at AI Engineer

Why Rust is the ideal language for vibe-coding — Daniel Szoke at AI Engineer

The maturity phases of running evals — Phil Hetzel at AI Engineer

Theo: how I code with AI changed a lot

Nate B Jones: red-team your own AI documents

Power-user playbook: steer Claude, run agents in parallel, persist context

35 Claude Code skills worth knowing

Deno open-sources Claw Patrol, an agent firewall

Two smaller tools: AIPointer and marimo's agent pairing

Xiaomi's Mimo V2.5 Pro slashes API prices up to 99%

Energy-based models: why transformers are “incomplete”

Odyssey's Agora 1: a generative world model that's actually multiplayer

Why bag-of-words still breaks modern embeddings

Yellow Key: a BitLocker bypass zero-day

Notion's Ivan Zhao: build a company like a religion

A Neanderthal DNA mystery, solved in a Spanish pit

Refresher: why cache invalidation is so hard

Sources