April 22, 2026
Anthropic briefly removed Claude Code from the $20/mo Pro plan, restricting it to the $100–$200 Max tiers, then rolled back after backlash — though Anthropic's Amol Avasare confirmed an A/B test hitting ~2% of new prosumer signups[1]Simon Willison, Is Claude Code going to cost $100/month?. Same day, GitHub paused new Copilot Individual signups, tightened usage limits, and locked Opus 4.7 behind a new $39/month Pro+ tier, citing how "agentic workflows have fundamentally changed Copilot's compute demands"[2]Simon Willison, Changes to GitHub Copilot Individual plans. The economics of agentic coding are catching up to flat-rate pricing.
Anthropic's pricing page briefly showed Claude Code as unavailable on the $20/mo Pro plan, moving it exclusively to the $100 and $200/mo Max tiers — a 5x price hike for existing Pro users[1]Simon Willison, Claude Code pricing confusion. Simon's three concerns: users outside the US where $100/mo is prohibitive, trust damage from announcing pricing via employee tweets, and the strategic risk of restricting a category-defining tool as Codex gains ground. The visible change reverted fast, but the underlying A/B test appears to continue.
A tweet from an employee is not the way to make an announcement like this.
GitHub moved Claude Opus 4.7 exclusively to a pricier $39/mo Pro+ tier while discontinuing previous Opus models, paused new Copilot Individual signups, and tightened usage limits — conceding that long-running agent sessions burn far more tokens than flat per-request pricing assumed[2]Simon Willison, Changes to GitHub Copilot Individual plans. Simon's real complaint is clarity: with Microsoft shipping ~75 products branded "Copilot," it's unclear from the post which Copilot (CLI, cloud agent, code review, IDE plugin) is actually affected.
agentic workflows have fundamentally changed Copilot's compute demands
Shopify CTO Mikhail Parakhin on Latent Space says Jensen Huang's token-consumption framing is "directionally correct" while arguing the real anti-pattern is parallel-agent swarms that don't communicate[3]Latent Space, Mikhail Parakhin on CI/CD at AI Speed. Shopify funds unlimited tokens internally and steers employees to "nothing less than us 4.6" — token distributions are already skewed toward a handful of power users. ~09:13
Firefox CTO Bobby Holley revealed that Mozilla partnered with Anthropic, using a Claude Mythos Preview model to audit Firefox 150 and ship fixes for 271 vulnerabilities in a single release[4]Simon Willison, Quoting Bobby Holley. Holley frames it as an inflection point for defensive security.
Mozilla reprioritized other work to patch the 271 vulnerabilities Claude Mythos Preview surfaced in Firefox 150. Simon Willison surfaced the quote as a signal of how quickly AI-assisted audits are reshaping browser security[4]Simon Willison, Quoting Bobby Holley. See topic 7 for Theo's take on the same Mythos model being used to find a 27-year-old OpenBSD bug — the defensive-offensive token arms race is live.
Defenders finally have a chance to win, decisively.
Qwen3.6-27B is a 27B dense model Qwen claims beats the 397B Qwen3.5 on coding benchmarks — 15x smaller (16.8GB quantized vs 807GB), running locally at ~25 tok/s on Simon's Mac[5]Simon Willison, Qwen3.6-27B. Separately, Qwen 2.5 VL 7B is a 7B vision-language model that reads images, debugs code screenshots, and understands video locally via llama.cpp[6]Better Stack, This 7B AI Reads Images, Fixes Code & Understands Video. Consumer hardware has crossed a usability threshold.
The Q4_K_M quantized build runs locally at ~16.8GB. Simon benchmarked it at 54.32 tok/s read and ~25 tok/s generation, producing a 4,444-token pelican-on-a-bicycle SVG in 2m53s that he called "an outstanding result for a 16.8GB local model" — anatomically coherent bird, working bicycle mechanics, realistic limb positions[5]Simon Willison, Qwen3.6-27B. A second e-scooter prompt generated 6,575 tokens of creative scene description.
an outstanding result for a 16.8GB local model
Dynamic resolution plus 4-bit quantization via llama.cpp or Ollama. Demos show it extracting text and tables from messy image data, identifying bugs in code screenshots and providing actual fixes, and processing video — all local, handling high-res images without blowing out VRAM[6]Better Stack, Qwen 2.5 VL 7B. ~00:00
Local vision models are supposed to run on your laptop, but most of them are just painfully slow. This one isn't.
Anthropic published findings from an 81,000-user Claude survey: productivity gains are U-shaped across income — software developers and delivery drivers both report mean benefits of 5.1/7 — and displacement anxiety scales directly with AI exposure[7]Anthropic Research, What 81,000 people told us about the economics of AI. Only 60% of early-career workers report personal benefit versus 80% of senior staff. Anthropic is running this monthly now via "Anthropic Interviewer"[8]Anthropic Research, Announcing the Anthropic Economic Index Survey.
First, displacement anxiety tracks exposure: every 10-point increase in observed AI exposure correlates with a 1.3-point increase in perceived job threat, and early-career workers report triple the worry of senior professionals[7]Anthropic 81K economics. Second, productivity gains are U-shaped — both the highest-paid developers and lowest-paid delivery drivers report substantial benefit (mean 5.1/7). 48% cited scope expansion as the primary benefit; 40% cited speed. Third, the speedup-anxiety paradox: the workers experiencing the biggest speedups also express the highest displacement worry. 20% voiced explicit displacement concerns, 10% said employers demanded more work rather than accepting efficiency gains, and only 3% reported negative or neutral impacts.
one developer noted building a website in 5 days that previously took months
Anthropic launched the Economic Index Survey to gather open-ended monthly responses from personal Claude account holders (minimum 2 weeks old) covering current work changes, observed organizational shifts, 12-month expectations, and 10-year vision[8]Anthropic Economic Index Survey. Framing: traditional labor metrics lag real economic shifts; qualitative user data is a faster signal for the transition.
OpenAI dropped a six-video launch for workspace agents in ChatGPT: an intro trailer plus five template demos covering software review, weekly metrics, third-party risk, lead outreach, and product feedback routing[9]OpenAI, Introducing workspace agents in ChatGPT. The pattern across all five: plain-language config, reusable "skill" definitions, scheduled triggers, and Slack/Jira/Linear/Gmail/Drive connectors. OpenAI and Anthropic also shipped two major context-automation features the same day — Codex Chronicle and Claude Live Artifacts.
~00:01 Slate handles employee software procurement requests via Slack: researches the tool, compares it against the approved stack, and auto-creates a Jira ticket if IT provisioning is needed[10]OpenAI, Software review agent.
Skills define best practices and contain the necessary instructions for an agent to perform its work accurately and consistently.
~00:02 Connects to Google Drive via an agent-owned connection (service-account style), runs every Friday on a trigger message, calculates metrics, generates charts, and delivers a shareable readout with full activity history[11]OpenAI, Weekly metrics reporting agent. ChatGPT itself helps write the underlying "metrics calculation skill."
~00:01 A finance team's existing vendor risk assessment skill is fed into ChatGPT via a workflow prompt; ChatGPT generates the full agent config — instructions, tool connections, integrations — with no engineering resources required. Completes analyses in minutes rather than hours[12]OpenAI, Third-party risk management agent.
~00:00 Matthew's SMB sales agent: picks up contact-form submissions, grades each lead against qualification criteria, sends the initial email, drafts the follow-up in Gmail, and schedules a reminder — all autonomously[13]OpenAI, Lead outreach agent.
~00:00 Scrapes web forums, groups recurring pain points, posts a daily Slack digest to product leadership, and integrates with Linear — enriches existing tickets if matches exist, creates new tickets with customer context if not[14]OpenAI, Product feedback routing agent.
My agent can only use the tools and data I give it access to.
~00:00 OpenAI shipped Chronicle, a background agent in Codex that continuously screenshots your workflow to build persistent memory — so Codex understands "that error on screen" or "the thing I was working on two weeks ago" without manual context[15]AI Daily Brief, Automating Your AI Context. Sam Altman said the internal working name was "telepathy" and Greg Brockman called it "surprisingly magical." Tradeoff: heavy token usage plus Microsoft-Recall-style privacy questions — targeted at professionals on company plans, not general consumers. ~02:00 Anthropic in parallel launched live artifacts in Claude — interactive dashboards that pull real-time data from connected apps (personalized morning brief, lunar-mission-style mission control).
The internal working name for this was telepathy, and it feels like it. — Sam Altman
Both reviewers independently concluded GPT Image 2 dominates Google's Imagen 3 (Nano Banana Pro). Nate Herk ran 30 head-to-head prompts with Claude Opus 4.7 as judge; GPT Image 2 won across artistic styles, character consistency, and UI generation[16]Nate Herk, OpenAI Image 2 is Nuts. AI Search went 28 rounds deep and clocked GPT Image 2 winning ~90% — now leading the Arena leaderboard by ~300 points, with 3:1 aspect ratios and 2K via API[17]AI Search, New AI image generator BEATS EVERYTHING.
Rather than generate manually, Nate gave Claude Code the full pipeline: generate all 30 images on both models, insert into side-by-side dashboards on localhost, and have Claude Opus 4.7 score each category[16]Nate Herk, OpenAI Image 2 10 ways. GPT Image 2 wins on realism ("shot on iPhone" vs Imagen's "too perfect"); product photography was a tie; Imagen's edge on a banking UI was mostly from pulling real logos via web search. Pricing: Imagen 4–9¢ depending on resolution, GPT Image 2 flat 6¢. ~04:55
This looks so much more real. This looks like something you might shoot on an iPhone, whereas this looks like it sometimes it just looks too good.
~06:04 Nate demos pitch-ready cereal boxes and pill bottles with accurate barcodes and nutrition facts (no text errors); handwriting scan cleanup that matches his handwriting precisely; ad creative split tests; localized ad creatives; LinkedIn carousels; restaurant menu photography; logo style variations (3D, plush, glass); real-estate interior redesign. One limitation: thumbnail generation with a reference photo degraded over multiple API calls[16]Nate Herk, 10 ways.
~00:00 GPT Image 2 wins: 100-poster anime grid (accurate characters and titles vs Imagen's misspellings), Windows 11 desktop mockup (Slack/Gmail/Excel rendered correctly), YouTube homepage, Hong Kong MTR map in Chinese and English, iPhone exploded diagram, storyboard ad generation, and the famously hard clock-showing-11:15 plus full wine glass prompt[17]AI Search, New AI image generator. Ties: Hong Kong street scene, Naruto-Gojo manga page. Both failed: biology worksheet labeling, endemic-frog grid, chess checkmate puzzle, Where's Waldo generation.
For over like 90% of the instances, GPT image was clearly better than Nano Banana.
~31:21 Free for all ChatGPT users (3–5 images/day on free tier); up to 2K via API (native UI caps at 1K); aspect ratios up to 3:1 or 1:3; strong multilingual text rendering for non-Latin scripts[17]AI Search, ChatGPT Images 2.0 Capabilities. Arena leaderboard: text-to-image lead of ~300 points, image editing lead of ~100 points.
Cal.com closed its source, citing AI-accelerated security threats — including Anthropic's Claude Mythos Preview finding a 27-year-old OpenBSD vulnerability. Theo's counter: closed source only buys time, and the real security paradigm is now "proof of work" — whoever spends more tokens wins[18]Theo, Open source is dead now?. An ASI analysis of Claude Mythos showed it was the only model to complete a 32-step corporate network attack simulation, at ~$12,500/run, with no diminishing returns up to the 100M-token cap.
~05:03 Theo's thesis: exploiting software has historically required both deep security knowledge and deep domain knowledge — a rare pairing. Open-source codebases weren't meaningfully more dangerous because attackers lacked the domain chops to weaponize the source[18]Theo, Open source is dead?. LLMs have now collapsed the domain axis toward zero.
Open source is dead. That's not a statement we ever thought we'd make… AI has fundamentally altered the security landscape. — Cal.com
~11:06 Theo estimates closed source raises the domain-knowledge bar from ~1 to ~4 of 10 today, but that advantage erodes as models improve at decompilation and deobfuscation. Tanner's counter: "If AI reading your open source code is hurting your business, you are using open source as a growth strategy instead of a philosophy. Closing it doesn't make you secure. It just means fewer good-faith developers are hardening your code"[18]Theo.
~15:09 The framing: cybersecurity now resembles crypto proof-of-work — to harden a system, defenders must spend more tokens discovering exploits than attackers spend exploiting. Open source benefits because multiple orgs can pool defensive token spend additively; attacker spend isn't additive across groups. ~23:14 FFmpeg cautionary: Google found a low-severity exploit via AI-assisted scanning, gave 3+ months notice, and FFmpeg publicly dismissed it as "CVE slop." Theo: attackers don't need maintainer permission to exploit, but fixers need it to patch[18]Theo. Tie back to topic 2 — Firefox used the same Claude Mythos Preview defensively and killed 271 vulns.
Cyber security looks like proof of work now. Is security going to spend more tokens than your attacker?
A context.ai employee downloaded a Roblox auto-farm hack containing Luma Stealer, which scraped Google Workspace, Supabase, Datadog, and AuthKit credentials. Attackers pivoted through an OAuth token to a Vercel employee's Google account, then to internal Vercel tools — reading non-sensitive env vars in plaintext[19]Better Stack, Someone Cheated at Roblox and Broke Vercel. ShinyHunters-branded listing demanded $2M on breach forums. Vercel has changed the default for new env vars to "sensitive."
~00:00 (1) context.ai employee installs Roblox cheat on a work laptop. (2) Luma Stealer — an info-stealer active since 2022 — scrapes live session cookies including Google Workspace and keys for Supabase, Datadog, and AuthKit. (3) Attackers access context.ai's AWS environment, find a database of OAuth tokens for the legacy AI Office Suite product. (4) A Vercel employee had previously signed up for that AI Office Suite using their Vercel Google Workspace account with "allow all" permissions; their OAuth token sat in that database. (5) Attackers pivoted into the Vercel employee's Google account — no password or MFA needed — gaining access to Linear and a backend that could read non-sensitive env vars in plaintext[19]Better Stack, Vercel breach. (6) On April 19, 2026, the threat actor listed the data for $2M as proof, including source code, NPM/GitHub tokens, employee records, and a Vercel enterprise dashboard screenshot. Vercel confirmed Next.js and Turbopack are unaffected and flipped the env var default to "sensitive."
One over-permissioned AI tool and a random employee trying to cheat at Roblox — it's all that it takes to compromise one of the biggest infrastructure platforms on the web.
SpaceX tweeted an option to acquire Cursor by year-end for $60 billion, or pay $10 billion for the partnership-only path[20]Better Stack, SpaceX Is Buying Cursor for $60 Billion. The combined pitch: Cursor's product and distribution plus SpaceX's supercomputer — claimed equivalent to one million Nvidia H100s.
~00:00 The narrator frames the deal as: xAI was struggling to train competitive models and Cursor was burning money, so combining made sense[20]Better Stack, SpaceX Cursor. The tweet-as-announcement pattern echoes Elon's Twitter/X playbook — not yet finalized, but once Elon tweets, the deal tends to happen.
SpaceX described the partnership as a project combining Cursor's product and distribution with SpaceX's colossal supercomputer, which they claim has the equivalent compute power of a million Nvidia H100 chips.
Tim Cook is stepping down after 15 years; hardware chief John Ternus takes over. Apple also cut a reported ~$1B deal for Google's Gemini to power Siri — effectively outsourcing the frontier model while doubling down on hardware as distribution[21]AI Daily Brief, How Apple's AI Strategy Changes. Mac hardware became the de facto AI dev platform by accident via Open Claw, selling out Mac Minis nationwide.
~00:00 Apple's post-ChatGPT AI posture was nearly absent under John Giannandrea; Apple Intelligence failed to ship its promised Siri overhaul. Then Open Claw, an open-source agent harness, made Mac Minis the hardware of choice for agentic workloads[21]AI Daily Brief, Apple AI Strategy. Max Weinbach: leading AI products are Mac-only or Mac-first. Apple then paid ~$1B for Gemini to power Siri, giving Google's model a path into 2.5 billion Apple users.
If you don't have a Mac and are trying to keep up with the cutting-edge AI, you literally can't. Everything is Mac-only or Mac-first.
~03:01 Cook's legacy: Apple from $350B to $4T market cap, services to ~$100B/year. Critics note the 11x gain trailed Microsoft (14x), Google (20x), Amazon (28x), and Meta (35x), with no breakthrough product beyond AirPods[21]AI Daily Brief, Apple CEO transition. Ternus joined in 2001 and was promoted from within. Craig Federighi was reportedly passed over because he "fumbled the bag on AI and Siri." Bloomberg's Mark Gurman frames Ternus as Jobs-era decisive — he'll make the call rather than keep asking questions.
Ternus will make decisions. If you go to Tim with A or B, he won't pick. He'll ask a series of questions instead.
~07:03 WWDC and the fall iPhone launch are the upcoming catalysts. The market is cautiously optimistic; the AI industry more skeptical. The host's read: simply shipping an actually good Siri would do more than anything else[21]AI Daily Brief, Implications for Apple Intelligence.
Jensen Huang told Dwarkesh that Nvidia passed on an early Anthropic round because they weren't making external investments at the time — and because he assumed AI labs could raise from VCs like normal startups. His own words: that assumption was wrong[22]Dwarkesh Patel, Jensen Huang on Why Nvidia Passed on Anthropic.
~00:00 The multi-billion-dollar round Anthropic needed — tied to compute commitments — was only ever available from hyperscalers exchanging cash for cloud usage. Jensen admits the miss without bitterness and now counts OpenAI and Anthropic among his investments[22]Dwarkesh, Jensen on Anthropic. He calls Anthropic's founders strategically smart for structuring compute-for-equity with Google and AWS.
I always thought that they could just go raise VCs for god's sakes, like all companies do, but what they were trying to do couldn't have been done through VCs.
Even though we caused Anthropic to have to go to somebody else, I'm still happy that it happened.
Dan Shipper interviews Kieran Klaassen (GM of Cora, creator of Compound Engineering) on the "AI sandwich" thesis: humans at the front (ideate) and back (taste/polish) of any workflow, AI in the middle (plan, do, review)[23]AI and I, The AI Sandwich. Dan's AGI bar: when it's economically sensible to run an agent 24/7 that picks its own next task. We're not close. See topic 20 for Kieran's Compound Engineering tutorial.
~00:00 Cold open: humans are the bread, AI is the filling[23]AI Sandwich.
~02:03 Compound Engineering's original four steps: plan, do, review, compound — feeding learnings back into a persistent repo the agent consults next time.
~05:08 Why the "do" phase is effectively solved if the plan is good. LLMs follow good plans reliably for hours or days.
~06:08 Trevin Chow added brainstorm and ideate at the front — the human must think hard, the LLM supports.
~07:09 The back of the sandwich: after automated testing, a human clicks around and polishes until the product "feels great."
The beginning and the end, the middle is kind of solved and can be automated pretty well.
~12:18 Won't agents eventually do ideation too? Dan's counter: humans are uniquely good at reframing problems at higher levels — the "my knee hurts" example (Advil vs. IT-band stretch vs. stop running on hard surfaces).
~20:27 Language models spent the last year "in a box" — they produce generic output that isn't yours unless you set the frame. Dan's AGI bar: economically sensible to run an agent 24/7 that picks its next task.
Language models are a superintelligence that has been kept in a box for the last year and has no idea of what's going on in the world.
~23:31 Kieran's classical-composition training: the middle (practicing a piece 100 times) is rote; composing and performing are irreducibly human.
~25:33 Work sits on a rote-to-art spectrum. Lean into whatever form of beauty brings you joy — that's where LLMs give you energy instead of draining you.
Lean into what is beautiful to you because then you will find a way to utilize an LLM to make something that gives you energy instead of drains you.
Martin Kleppmann joins Gergely Orosz on the 2nd edition of Designing Data-Intensive Applications — cloud-native primitives replace local-disk assumptions, MapReduce got gutted, vector indexes and dataframes are in, and LLMs increase the need for formal verification because "we're vibe coding a bunch of stuff"[24]Pragmatic Engineer, Designing Data-Intensive Applications with Martin Kleppmann. Also: a multi-cloud hot take tied to US-EU geopolitics.
~02:01 Career origin: GoTest.it (Selenium cross-browser testing), then Reportive (Gmail extension, YC-backed). LinkedIn acquired Reportive in 2012 — "the least bad option" rather than a dream exit. Kleppmann joined the Samza team on freshly open-sourced Kafka[24]DDIA interview.
~11:05 Kafka at LinkedIn and the seed of DDIA. Jay Kreps' "The Log" framing — append-only, lowest-common-denominator abstraction for moving data across databases, activity streams, Hadoop.
~16:08 Writing the first edition took ~4 years (2.5 FTE); missed O'Reilly's deadline by 2.5 years. Research method: quiz senior LinkedIn engineers, backfill with papers, hence the heavy reference lists.
~26:16 Big structural change: databases built on object stores, where replication happens at the object-store layer instead of the database. Kleppmann pushes back on the "managed services deskill engineers" framing — he analogizes to garbage collection. Engineers should still know enough internals (B-trees vs LSM, row vs column stores) to debug weird performance[24]DDIA 2nd ed cloud-native.
If you rely on a higher level abstraction, you're no longer thinking about the lower level details… If you're building higher level business logic, actually, I think it's just fine.
~35:19 From a European perspective, Kleppmann argues multi-cloud deserves serious consideration as a hedge against a scenario where US-EU tensions escalate and European customers get locked out of US clouds. Engineers have a responsibility to surface these trade-offs[24]DDIA multi-cloud.
What if geopolitics was to go horribly wrong and tensions escalate and Europe finds itself suddenly locked out of US cloud services?
~46:25 MapReduce coverage got gutted — retained only as a teaching scaffold for Spark/Flink. Added: vector indexes (alongside other index types) and dataframes as a first-class data model[24]DDIA MapReduce. Co-author Chris Riccomini joined for industry currency.
MapReduce is dead. Nobody uses it anymore.
~52:31 Arguably the biggest hot take. Start with model checking (TLA+, FizzBee); proof assistants (Isabelle, Coq, Lean) have historically been uneconomical in industry. LLMs change the math both ways: they're getting good enough at writing proofs that previously uneconomical verification becomes feasible, and the flood of AI-generated code creates a review bottleneck humans can't absorb[24]DDIA formal verification.
LLMs increase the need for these formal proofs because we're vibe coding a bunch of stuff.
~61:35 Hard problem: access control without a central server. If edit permission is revoked concurrently with an edit, peers may order events differently and diverge permanently — clocks can't help, malicious users backdate. His group is building this into Automerge without full consensus. He's also bootstrapping cryptography-for-the-physical-world research around supply-chain emissions and EU deforestation regulations (coffee, cocoa, palm oil)[24]DDIA local-first.
Software as a service businesses… are able to essentially hold a gun to the customer's head and say, "Pay us your subscription, otherwise we will delete all your data."
~72:42 Cambridge added a first-year bootcamp covering version control, unit testing, and generative AI basics. Industry rewards outputs; academia rewards the learning process. Best PhD students have industry experience first[24]DDIA education.
People in industry, I feel like sort of have short circuit reasoning… "oh, I heard this from a conference talk, I'm just going to go with that."
Shopify CTO Mikhail Parakhin joins Latent Space on what AI-speed engineering looks like at a company where nearly 100% of employees touch AI coding tools daily. CI/CD is breaking under AI-generated volume — the fix is a "very strong narrow waist" at PR review using pro-level models (GPT 5.4 Pro, Gemini Deep Think) taking turns for up to an hour[3]Latent Space, Mikhail Parakhin on CI/CD at AI Speed. Plus: Tangle, Tangent auto-research, Sim Gym, and Liquid AI.
~00:00 Hot take: Jensen is directionally correct, but parallel-agent swarms that don't communicate are the real anti-pattern. Shopify learned to use fewer agents with cross-model critique loops.
~04:05 Shopify internal AI adoption — ~100% DAU, December 2025 phase transition. Token distributions are becoming skewed to power users.
~06:08 CLI agents (Claude Code, Codex, internal "River") beating IDE tools like Cursor and Copilot. Model floor: "please don't use anything less than us 4.6"[3]Shopify CLI vs IDE.
It's not about just consuming tokens. The anti-pattern is running multiple agents too many agents in parallel that don't communicate with each other — that's almost useless.
~11:24 A good model writes code with fewer bugs per line than the average human, but sheer volume means more bugs ship. Fix: pro-level models doing extended PR review. Mikhail hasn't found a commercial PR review tool he likes — their business models push toward cheap, high-volume review — so Shopify is building their own[3]Shopify PR review.
If you really want to stand the tide of bugs going into production, you need to spend a lot of time… Codex or Claude Code is not going to cut it. You need to have pro-level models.
~16:29 Shopify uses Graphite stacks heavily, but Mikhail thinks git and the PR metaphor are themselves bottlenecks in an agentic world. Heretical take: "a lifelong opponent of microservices" now thinks microprocesses might make a comeback because merge conflicts become a global mutex at machine speed[3]Shopify microservices.
I'm a lifelong opponent of microservices and I always thought that was a really bad idea, and now… maybe microprocesses will make a comeback.
~20:32 Tangle — Shopify's third-gen ML experimentation platform (successor to Ether and Yandex's Nirvana) with content-hash caching, automatic deduplication across teams, one-click dev-to-prod[3]Shopify Tangle.
~28:36 Tangent — Karpathy-style auto-research loop. Real wins: search throughput 800 QPS → 4200 QPS same quality same hardware; the #1 user is a PM, not an ML engineer. Mikhail ran 400 experiments on a personal problem expecting to debunk auto-research and got one win he couldn't have found himself.
We went from 800 QPS to 4200 QPS with the same quality just by pure optimizations and an auto-research loop.
~38:49 Sim Gym — browser-based customer simulation trained on Shopify's AB-test history, hitting 0.7 correlation with add-to-cart events. Runs on Fireworks, CentML (recently acquired by Nvidia), and Browserbase. Liquid AI's non-transformer state-space-squared models now power low-latency search (300M params, 30ms end-to-end) and offline catalog distillation, taking share from Qwen internally on the merits.
Stripe head of design Katie Dill walks through redesigning stripe.com after six years, how AI accelerated exploration without replacing craft, and the principles her team ships under: fight mediocrity, prototype don't present, aim for MVQP not MVP, and walk the store[25]Y Combinator, How Stripe Built Their New Website.
~00:00 Why the 2020 site had to go: product sprawl outgrew the story. Over 78% of the Forbes AI 50 use Stripe, but on the old site AI was a single tile.
~05:06 Patrick Collison asked "what is the point of a website?" Conclusion: it is a manifesto — what you are, who you serve, the care you put into details[25]Stripe website as manifesto.
~06:08 New homepage anatomy: kept headline ("financial infrastructure to grow your revenue"), added a live GDP counter (a billionth of global GDP runs on Stripe), bento grid with modal overlays for progressive disclosure.
~11:11 Animation as proof of care — ball-and-line metrics animations re-tuned repeatedly; launch slipped December → January to get transitions right.
~16:15 Custom brand-merged hero images for featured customers were AI-generated. "Just ask AI for a parallelogram" produced uncanny results — bubbles in ice-cube shots, missing hands, off shadows — so each image went through detailed critique[25]Stripe AI imagery.
AI is really good at these pictures that seem super real… but it doesn't replace craft. It doesn't replace taste. It doesn't replace the attention to detail.
~19:16 Stripe built an internal tool to parameter-sweep the hero gradient wave — blur, grain, rotation, thickness, color mix, texture — then tested performance and legibility over text.
~24:18 Bento iterations: cramped all-in-one, scrolly-telling, accordion. User research killed the accordion because people don't click tabs in lean-back browse mode.
~29:22 AI raises the design floor — baseline sign-up modals get to 7/10 fast — so designers should redirect their found time into new paradigms like agent experience design: what does UX look like when agents are traversing your product?[25]Stripe agent experience
People are using agents to build their businesses now… so what is your agent experience? How good is that?
~35:26 Core principles: fight the gravitational pull to mediocrity (good enough compounds into a mediocre company), prototype don't present, aim for a minimum viable quality product, and "walk the store" — a cultural practice where everyone including founders uses the products end-to-end weekly[25]Stripe MVQP.
You don't need to accept slop and you shouldn't accept slop. You should hunt for, fight for the right solution.
Matt Williams and Ryan catch up on Claude Design (which Matt finds frustrating), Kimiko 2.6 on a 14-node Raspberry Pi cluster, Datadog's State of AI Engineering report, Google DeepMind's WeatherNext 2 world model, and Row Zero — a spreadsheet that chews through billions of rows[26]Matt Williams, Matt and Ryan have a chat. Runs through Matt's "time to all caps" metric for Opus 4.7 and the vibe-indistinguishability of 4.5/4.6/4.7.
~01:00 Claude Design rant — Matt's new "time to all caps" metric for how quickly a model frustrates him into shouting. Opus 4.7 isn't meaningfully better than 4.6 or 4.5 and is probably the wrong default for a design app; Sonnet or Haiku would be better[26]Matt Williams Claude Design.
~05:00 Kimiko 2.6 on a 14-node Raspberry Pi cluster. Not on Ollama yet, estimated ~100GB memory. SWE-bench Pro is on par with Anthropic/OpenAI/Google models, but Terminal-Bench 2 is what Matt cares about — best predictor of quality for Claude-style assistants.
~08:03 Datadog State of AI Engineering methodology: traces anything OpenAI-API-compatible plus LangChain-style frameworks, can't see purely offline work (llama.cpp, Deep Seek direct).
~11:04 Three-to-five horse race prediction. Growing US-firm appetite for Chinese open-source models — getting so good, open source, can grab it offline.
I'm seeing a lot more appetite from US firms to use Chinese or non-US models. One because they're getting so good, two because they're open source, and three because like you can grab it offline.
~14:06 Ryan: "I literally can't tell the difference between 4.5, 4.6 and 4.7 when it comes to Opus." Vibe-based engineering on opaque model swaps has become normalized[26]Matt Williams vibe engineering.
~30:21 Matt loads Swyx/Latent Space's JSON vector catalog into DuckDB, scrapes two-day live streams, runs transcripts through an LLM, and clusters themes ("post-agentic," "progressive disclosure of MCPs"). Standout talk: Google DeepMind's WeatherNext 2, a world model on Vertex that cubes the Earth's atmosphere (subtracting the Earth) to simulate atmospheric dynamics for weather prediction. Ryan's Puget Sound hail-and-waterspout story ties back — predicting events like that requires atmospheric modeling, not surface data[26]WeatherNext 2.
~39:27 Claude Design defaults to Anthropic-brand pastels even told "no pastels," insists on left-side toolbars and 1-pixel separators, can't do primary colors or Bauhaus/Teenage Engineering, can't do dense CJK layouts, lies about designing professional slide decks. Hot take: browser tabs belong on the left, not the top — Claude can't conceive of this.
It just feels like it's so opinionated with the wrong opinions.
~49:30 Hosted spreadsheet handling tens of millions to billions of rows with pivot tables, database integrations, and Python custom functions. Ryan demos a 26M-row US flights dataset from the Bureau of Transportation Statistics[26]Row Zero.
Legora CTO Jacob Lauritzen argues chat is a low-bandwidth, one-dimensional interface that breaks down for long-running vertical agents. The economics have flipped: doing work is cheap, planning and reviewing are the new bottleneck. Agents and humans should collaborate inside high-bandwidth persistent artifacts — documents, tabular reviews — with skills, elicitation, and decision logs[27]AI Engineer, Agents need more than a chat.
~00:07 Opening failure mode: you ask a long-running agent to draft a contract, it spawns sub-agents, runs 30 minutes, hits compaction — "that's when you know you can give up, it's going to forget everything, it's in the context rot state" — and returns a contract where clause three is wrong.
~02:10 Thesis: planning and reviewing are the new bottleneck, not doing. Legora is a collaborative AI workspace for law firms with 1,000+ customers across 50+ markets — possibly the fastest-growing vertical AI company in history[27]Lauritzen thesis.
Planning work and reviewing work is the new bottleneck.
~03:13 Jason's verifier's rule extended: solvable + easy to verify = AI will solve it. In legal, checking definitions in a contract is trivially verifiable; drafting contract language is only truly verified in court; litigation strategy is essentially unverifiable.
~05:14 Trust vs control: proxies (golden contracts, browser-test-driven development), decomposition (human picks risk profile and negotiation stance; agent handles formatting and definition linting), and guardrails (file/directory/site restrictions).
~08:14 Pure chat gives control only at the root of the work DAG. Planning lets you steer up front but forces you to think before any work is done — it can't anticipate surprises like a special clause discovered mid-task. Lauritzen predicts planning won't stick around[27]Lauritzen DAG.
~10:16 Skills encode human judgment at the nodes of the work tree. Elicitation should never block — the agent makes a decision to unblock itself and writes the choice to a decision log the human can later reverse.
~11:18 Legora's UX: a document where you highlight clause three and only clause three changes, tag agents and collaborators, and hand off sections to specialist agents. A "tabular review" primitive — the agent reviews many contracts and flags items it wants human input on — gives high control and easy review[27]Legora artifacts.
~13:21 Closing: chat-as-input is great; chat-as-main-mode-of-collaboration isn't. Language is the universal human interface only because humans are limited to it.
Agents aren't humans, and so we should not constrain them to human language.
Cloudflare built Artifacts, a distributed Git-compatible file system on Durable Objects, aimed at letting agents create, fork, and manage thousands of repos at scale[28]Better Stack, GitHub Was NOT Made for AI Agents. Git server is Zig compiled to WASM running on a Durable Object; data is persisted in the Durable Object's SQLite. Currently in private beta.
~00:01 GitHub's social layer (stars, followers, discussions) is irrelevant to agents. Artifacts exposes Git protocol + HTTP for any-language clients and supports per-session agent workspaces, parallel PR reviews, and automated large-codebase refactoring. Demo pattern: import a baseline repo, fork N times in parallel, one fork per task, with an Anthropic-powered agent making changes in each fork via isomorphic Git for in-memory clone/commit/push[28]Cloudflare Artifacts. Current limit: no Git diff command exposed yet.
GitHub was built for humans, not for agents… Cloudflare have built a basic Git implementation in Zig, compile it to WASM, and put it on a Durable Object to act as a Git server.
A cluster of agent-plumbing launches: Claude Context for semantic code search via MCP[29]AICodeKing, Claude Context, Microsoft's MarkItDown for document-to-markdown ingestion[30]Better Stack, MarkItDown, pi-computer-use wiring Anthropic computer-use into the Pi CLI[31]Github Awesome, pi-computer-use, Elastic Agent Builder as an MCP server for Elasticsearch[32]Arjay McCandless, Elastic MCP Server, and OpenRouter Workspaces for per-project keys and routing[33]OpenRouter, Introducing Workspaces.
~00:02 Zilliz's open-source MCP plugin indexes repos into a vector DB, retrieves relevant chunks on demand. Hybrid search (dense + BM25), AST-based chunking, incremental indexing via Merkle trees, multi-project support. Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, Cline, VS Code, Zed, and any MCP client. ~40% token reduction at equivalent retrieval quality (self-reported)[29]Claude Context. Local deployment path via Milvus + Ollama if you don't want to send embeddings to OpenAI.
The model is often smart enough. The issue is getting the right code in front of it without wasting time or tokens.
~00:00 Microsoft Research's open-source (MIT) tool, 110k GitHub stars, converts PDF/Word/Excel/PowerPoint/audio/images to clean markdown in one call, preserving headings and tables. Ships with an MCP server; image descriptions via LLM plugin. Tradeoff: speed and simplicity over the deeper extraction of Unstructured or DocQuery[30]MarkItDown.
~00:00 Wires Anthropic's computer-use API into the Pi CLI on macOS. Tell the agent "build a login screen and test if the hover state works" — it writes React, spins up the dev server, takes control of the mouse, opens Chrome, clicks the button, and visually verifies[31]pi-computer-use.
~00:00 Expose an MCP endpoint from your Elasticsearch cluster. Add frequently-run queries as tools, paste the MCP URL into Claude, and Claude can discover indices, search, and generate queries from natural language[32]Elastic MCP.
Per-workspace isolated environments: API keys, guardrails, cost/latency/quality routing, saved presets, BYOK provider keys, observability integrations, and member roles. Aimed at teams running staging vs production or multiple projects[33]OpenRouter Workspaces.
Github Awesome's 35 self-hosted projects rundown — highlights Trailbase (sub-ms SQLite Firebase alt), Octapota OS (memory OS for agents with persistent semantic search), Squarebox (local-first MCP server), and Models (OpenAI-compatible local inference server)[34]Github Awesome, 35 Self-hosted Projects. DeepLearningAI + Snowflake shipped a multimodal RAG course covering audio, images, and video from meeting recordings[35]DeepLearningAI, Multimodal RAG.
Kieran Klaassen (same Kieran from topic 12) runs a 50-minute Claude Code tutorial on Compound Engineering — his plan/work/assess/compound loop that makes Claude Code improve each session by writing learnings back to CLAUDE.md and a searchable docs/ tree[36]Kieran Klaassen, How to Make Claude Code Better Every Time. Plus: Playwright as a zero-overhead QA team, the mental model for slash commands vs sub-agents vs skills, and the & trick to push sessions to mobile.
~00:00 Four phases. (1) Plan: sub-agents research the codebase, look up framework docs, produce a spec. (2) Work: fresh context, load plan, execute while asking clarifying questions. (3) Assess: multi-persona review agents (security, architecture, simplicity, agent-native, DHH-style opinionated) write P1/P2/P3 issues to a to-dos folder. (4) Compound: capture insights into docs/ (searchable by front-matter) or append to CLAUDE.md[36]Compound Engineering loop. The plugin took him a year to build and is on GitHub.
AI can learn, which is really cool. So if you invest time to have the AI learn what you like and learn what it does wrong, it won't do it the next time.
~29:00 Opus 4.5 is the first model Kieran considers reliable enough for Playwright-driven browser automation. The flow: run the Playwright test command, Claude writes a test plan for the new feature, launches a real browser, tests, and if something breaks fixes the code and immediately retests. A separate command screen-records the flow, compresses it, and attaches it to the pull request[36]Playwright QA.
You don't even need to write the test. You just say "yo, just test it." It's like a QA team basically.
~46:30 Slash commands are user-triggered business logic that can chain other slash commands (LFG is a list of commands run as a pipeline). Sub-agents are for specialized/parallel work without polluting the main context. Skills are just-in-time markdown files Claude loads when relevant; the description should say "I can do X" not "call me when X"[36]Slash vs sub-agents vs skills.
& trick
~41:30 Kieran aliases cc to Claude Code with --dangerously-skip-permissions; real safety gates are PR review and merge. Alternative: type "add this to permissions" to build an allowlist[36]Permissions and tricks. Lesser-known: typing & (ampersand) in the Claude Code terminal pushes the session to the web/mobile app; "Open in CLI" does the reverse. Kick off an LFG run, push to the background, check progress on your phone.
Just make sure that if you do this, don't share your SSH secrets with it that can delete your production data.
Nate frames Karpathy's viral wiki idea (41,000 bookmarks) against his own OpenBrain project as two answers to one question: where should the AI do the hard thinking[37]Nate B Jones, Karpathy's Wiki vs. Open Brain. Karpathy's wiki is a write-time system; OpenBrain is a query-time SQL system. Nate's proposed synthesis: SQL as source of truth, wiki as compiled view.
~00:00 Karpathy: AI builds and maintains a personal wiki as plain folders and text files, Obsidian reads. OpenBrain: data stored faithfully in SQL tables; synthesis happens fresh on each query[37]Karpathy vs OpenBrain.
Deciding how you organize your context layer is one of the single most important things you can do in 2026.
~06:04 AI makes editorial decisions when it synthesizes. Correct 80-90% of the time; the 10-20% misframings get baked in invisibly. Karpathy keeps raw sources separate, but most adopters won't. Nate's sharpest contrast: a neglected database looks like ignorance (visible gaps), a neglected wiki drifts into active misinformation because the old synthesis reads with the confidence of well-written prose[37]Wiki trap. A smart wiki might "helpfully" resolve a contradiction (engineering says 12 weeks, sales promised 8) into a smoothed 10-week narrative, destroying the strategic signal.
Database staleness can look like ignorance… wiki staleness looks like active misinformation because you don't know that you're wrong because the page reads like it knows what it's talking about.
~22:11 Karpathy's wiki is built for 100–10,000 high-signal documents — not corporate-level memory. Multi-agent writes from Claude Code, ChatGPT, Cursor, and scheduled automations break it because two agents editing the same markdown page produce a merge mess. "Optimized for papers and articles speed, not Slack message and ticket update speed." OpenBrain's weaknesses: deep synthesis of 15+ facts is unpredictable with no pre-built map, no browseable artifact, and contradictions sit silently unless you query for them[37]Scale and teams.
~30:19 A compilation agent runs on schedule reading from OpenBrain, forms a knowledge graph, synthesizes across entries, and produces topic summaries as wiki pages browsable in Obsidian. The wiki is never edited directly — if it's wrong, you fix the source row in SQL and regenerate[37]Hybrid solution. Nate is launching an OpenBrain contradiction-surfacing plugin plus a graph/wiki compilation plugin.
~37:24 Nate's deepest takeaway from Karpathy: the paradigm shift is from Oracle (one-off answer engine) to Maintainer (ongoing job making a knowledge artifact better). Humans curate, think, select, ask; AI does grunt work in a sustained, compounding way. Karpathy's "idea file as a publishing format" is a new way to share technical knowledge that respects reader agency[37]Oracle to Maintainer.
Your company's AI generated knowledge right now is either a compounding asset or it's just a growing pile of noise.
Four shorter items. StrongDM's digital-twin integration testing is a live production AI software factory[38]Nate B Jones, Why Manual Testing Is Dead. Lenny's Podcast: PM renaissance and exhaustion, side-by-side[39]Lenny's Podcast, PM industry. Better Stack's early status-page SEO hack[40]Better Stack, Early growth hacking. Real Python on list comprehensions[41]Real Python, Custom Python List Comprehensions.
StrongDM built behavioral "digital twin" clones of every external service they integrate with — Okta, Jira, Slack, Google Docs/Drive/Sheets — so AI agents can run full integration tests without touching production. Output: 16,000 lines of Rust, 9,500 lines of Go, 6,700 lines of TypeScript in their CXDB AI context store, built end-to-end by agents[38]StrongDM digital twin. Host endorses the "$1,000 per human engineer per day on AI compute" threshold benchmark.
If you haven't spent a thousand per human engineer, your software factory has room for improvement.
Three years ago, product leaders were unhappy — "responsibility without authority." AI has changed that: PMs build and test directly with less dependency on others. But the industry is exhausted, and large employers are simultaneously shedding tens of thousands of workers while paying triple wages to new hires[39]Lenny PM industry.
Tens of thousands of people are being shed by larger employers that are also hiring and paying triple wages. That's like mind-boggling.
Early growth tactic: customers pointed status.yourcompany.com at Better Stack-powered pages, each subdomain becoming a backlink and SEO signal back to Better Stack's product landing pages. Cheap, blended engineering with marketing, drove organic search — then competitors copied it[40]Better Stack growth hack.
Quick overview of the [expression for item in iterable] pattern, readability tradeoffs, and generator expressions (same syntax, parentheses instead of brackets) for yielding without materializing a full list[41]Real Python list comprehensions.