May 21, 2026
OpenAI says an internal general-purpose reasoning model autonomously disproved Erdős' 1946 unit-distance conjecture — using algebraic number theory — and that the result has been independently verified by Tim Gowers, Noga Alon, and Thomas Bloom.[1]The Rundown AI — OpenAI cracks an 80-year math belief Sam Altman called it "kinda big." OpenAI is framing it as the first AI-driven discovery of genuinely novel mathematics — its working definition of "Level 4" AI: systems that can make original contributions across disciplines rather than just synthesizing existing knowledge.
Erdős' 1946 conjecture is about the maximum number of same-length connections (unit distances) that can link points in a plane — a stubbornly open problem in combinatorial geometry. The Rundown reports the model's argument leans on algebraic number theory, an unusual route for a problem that historically attracted combinatorial approaches.[1]The Rundown AI
The Rundown adds a useful caveat: OpenAI previously overclaimed a GPT-5 result in 2025 that turned out to be a literature finding rather than a true discovery. This one, however, has the names attached — Gowers, Alon, Bloom — so it carries more weight. The strategic angle is the "Level 4" branding: OpenAI is trying to put a stake in the ground for "AI as autonomous researcher," distinct from the model-router and agent-harness conversations of the past month.
"kinda big" — Sam Altman
Artificial Analysis benchmarked Cursor's new Composer 2.5 model and put it third on its Coding Agent Index with a score of 62 — behind Claude Opus 4.7 (66) and GPT-5.5 (65) — at $0.07/task standard vs. roughly $4.10–$4.82/task for the top two.[2]Artificial Analysis — Cursor's Composer 2.5 The "Fast" variant costs $0.44/task and completes work in 6.7 minutes (third fastest) but is 6× the standard mode cost.
Composer 2.5 is built on continued training of Moonshot AI's open-weights Kimi K2.5 model, with Cursor reporting that ~85% of total compute came from its own additional training and reinforcement learning.[2]Artificial Analysis Benchmarks span SWE-Bench-Pro-Hard-AA (+35 points over base K2.5), Terminal-Bench v2, and SWE-Atlas-QnA.
The cost story is the headline. Independent of how the benchmark numbers shake out, paying $0.07 per coding task instead of $4+ is a structural shift — Cursor is effectively buying market share from the frontier-priced incumbents while still landing on the same podium. If the quality holds in real-world use, this is the first big crack in Opus 4.7's pricing moat since launch.
Cohere released Command A+ as open weights, a year after the original Command A. It scores 37 on the Artificial Analysis Intelligence Index — the same tier as Claude 4.5 Haiku — but ranks first on AA-Omniscience Non-Hallucination at 86%, and runs at 281 output tokens/sec on Cohere's API.[3]Artificial Analysis — Command A+
Other reported benchmarks: ~11% HLE, ~76% GPQA Diamond, ~25% Terminal-Bench Hard, ~38% SciCode, 63% MMMU-Pro.[3]Artificial Analysis The headline is the non-hallucination result — Command A+ tops the leaderboard there, which makes it interesting as a retrieval-grounded model for enterprise RAG even when the raw intelligence number is mid-pack.
Artificial Analysis didn't publish parameter count, license terms, or pricing in this write-up — they link out to artificialanalysis.ai/models/command-a-plus for that. Year-over-year, Cohere's open-weights cadence is well behind the open-weights leaders (Kimi, DeepSeek, GLM), but a clear specialization in factuality may be a more defensible position than chasing the frontier.
Nvidia posted its 15th consecutive top-line beat and 14th straight EPS beat, raised its dividend from $0.01 to $0.25 per quarter, and authorized an $80B buyback expansion.[4]Sherwood Snacks — Nvidia beats again Tech Brew's frame is bigger: with the first Vera CPUs shipping to Anthropic, OpenAI, xAI, and Oracle, Nvidia is now opening a second front against Intel and AMD — Jensen Huang pegs it as a "$200 billion" market.[5]Tech Brew — Nvidia's CPU era has arrived
Sherwood doesn't publish exact revenue/EPS but frames the print as broadly strong with well-received Q2 guidance.[4]Sherwood Snacks Analyst attention is on Blackwell and the upcoming Rubin generation, with sales projected to top $1T through 2027. Nvidia entered earnings up ~20% in 2026, the second-best Magnificent 7 performer — but trailing semiconductor peers concentrated in memory, networking, and CPUs. The newsletter flags a recurring pop-then-fade pattern post-earnings.
"I want the business." — Jensen Huang, on Vera CPUs
Tech Brew reports Vera was announced in March and started shipping within two months to Anthropic, OpenAI, xAI, and Oracle.[5]Tech Brew Nvidia projects $20B in CPU sales for 2026 alone. The framing: agentic AI and physical robotics are CPU-intensive workloads, not pure GPU ones, so Nvidia is filling the gap before Intel and AMD can. Both incumbents are responding — Intel is tightening quality, AMD is building rival chips — while Meta and Amazon continue their homegrown alternatives. Tech Brew calls it "the beginning of the CPU wars."
Simon Willison shipped the first alpha of Datasette Agent — an extensible, plugin-based AI assistant that wraps Datasette and translates natural-language questions into SQL. A live demo runs at agent.datasette.io on Gemini 3.1 Flash-Lite; local models like gemma-4-26b-a4b via LM Studio are supported too.[6]Simon Willison — Datasette Agent
Datasette Agent combines Willison's three-year-old LLM Python library with Datasette so users can ask questions against any database — the agent generates SQL, runs it, and shows results.[6]Simon Willison — Datasette Agent Same day, he shipped three companion releases:
Worth watching as a reference implementation: the architecture leans into Willison's existing LLM ecosystem, so plugins compose cleanly with whatever model backend you bring. The "View SQL query" UX pattern is a small but pointed bet on transparency over magic — every chart and table answers "how did you get this?"
Google rolled out a clutch of Play Store changes at I/O 2026 — short-form app previews ("Play Shorts"), a conversational search ("Ask Play"), and a new distribution surface inside the Gemini assistant on Android and web that bypasses the Play Store entirely.[10]Google Blog — Google Play I/O 2026
The post is light on developer APIs and heavy on discovery surfaces. The five highlights:[10]Google Blog
The pattern: Google is pushing developers toward a multi-surface distribution model with Gemini as the connective layer. For ASO-heavy teams, "Ask Play" matters most — the SEO playbook for the Play Store is about to change.
Issue 652's picks lean toward fundamentals over fashion: a critique of A/B-test randomization errors, an argument that measurement (not modeling) is still the core of data science in the AI era, and a hands-on transformer-from-scratch tutorial.[11]Data Science Weekly — Issue 652
Top picks from the issue:[11]Data Science Weekly
Beyond the Erdős story, The Rundown's secondary bullets are unusually dense — a Gemini-powered Co-Scientist hit 91% scarring reduction in a Stanford liver-fibrosis study, Emergence's town sim showed wildly different agent behaviors (Claude: 0 crimes; Grok: 200+, all agents dead by day 4), and Intuit announced 17% workforce cuts on an AI pivot.[1]The Rundown AI
Notion's Ivan Zhao sits with HubSpot's Dharmesh Shah at Sequoia and walks through twice "refounding" the company — first in Kyoto pre-PMF, then in Cancun after GPT-4. The argument: building with LLMs is "brewing beer, not engineering bridges," so Notion now hires barbell teams (very junior + very senior), killed the CMO org, replaces SaaS planning with weekly product "jazz," and treats the company as a "jazz band, not a marching band."[12]Sequoia × Notion's Ivan Zhao
~00:00 — Jazz band, not marching band. Zhao opens with the core operating metaphor: classic SaaS hierarchies are marching bands optimized for predictability; building with LLMs requires the opposite — small, taste-driven units improvising against ambiguous outputs.[12]Sequoia × Notion's Ivan Zhao
~03:02 — Five years of pre-PMF despair. Zhao recounts the years before Notion clicked — building, scrapping, rebuilding. The throughline: most founders quit before the second refounding.
~06:03 — Building with LLMs is brewing beer. The pivotal frame: deterministic engineering doesn't apply to language models. You tune the yeast, you don't tell it where to go.
"Building classic software is like engineering a bridge… building with language models is like brewing beer — you cannot tell the yeast, 'hey, go toward that flavor profile more.'"
~10:04 — Hierarchy, flatter orgs, and barbell hiring. Zhao argues that since LLMs normalize capability, the differentiated inputs are taste and agency — which favor very junior people (high agency, low cost) and very senior people (high taste). Mid-level layers become hard to justify.
"Talent equals capability times taste times agency. Language models normalize capability, so we optimize for the latter."
~15:07 — Killing the CMO org; rebuilding sales hiring. Notion dissolved its CMO function and rebuilt marketing as product-led. In the companion clip, Zhao discusses where they stumbled in enterprise sales: their first hire was a "systems thinker" suited to PLG order-taking, not outbound enterprise selling, and PLG masks the absence of real outbound capability because customers already want to buy.[13]Sequoia — Notion sales mistakes
~24:16 — Refounding #1: Kyoto and craft culture. Zhao describes leaving SF for Kyoto pre-PMF, absorbing Japanese craft culture, and rebuilding Notion's design philosophy around it.
~33:22 — Refounding #2: GPT-4 in Cancun. When GPT-4 dropped, Zhao went to Cancun to think it through and came back having rewritten Notion's AI strategy from the ground up.
"GPT-4 was a full-body religious experience. Like, holy shit — anything you do, if you don't do this, it will be meaningless."
~38:30 — Decalcifying via acqui-hired founders. Bringing in founders from acquired startups as a deliberate counter to the calcification that hits mid-stage companies.
~47:33 — Personal operating system and AMAs. How Zhao runs his own week — Notion-as-second-brain, regular internal AMAs, transparency rituals.
~52:35 — Enterprise sales: stop reinventing the wheel. Classic enterprise sales hasn't changed since the 1990s; modern internal tooling can improve efficiency but the motion itself is well-known. Don't try to innovate everywhere — concentrate the innovation budget on the few things that truly matter.
~56:39 — Company as religion, culture as cult. The closing frame: durable companies look more like religions than corporations — long-lived, ritual-driven, with great founders and great heads of sales.
"The Catholic Church is one of the most successful companies of all time — 2,000 years. Great founder in Jesus and a great head of sales in Paul."
"You have to feel the AGI. You can't read about it, you can't watch YouTube — you have to build."
A YC partner argues the Roman-legion org chart — humans relaying information up and down a hierarchy — is broken by AI. The replacement: recursive self-improving AI loops as the organizational substrate, not a productivity add-on. YC's own internal example: a monitoring agent watches employee queries, detects failures, writes the fix, opens a PR, has another agent review and deploy it — all overnight. YC's demo-day companies now show ~5× revenue per employee vs. 18 months ago.[14]YC — Self-Improving Company
~00:00 — The Roman Legion problem. Most companies are organized like Roman legions. Jack Dorsey's framing: this assumption is broken by AI.[14]YC
~01:00 — Beyond co-pilots. "AI as productivity booster" is the old frame. The new frame: companies as recursive self-improving AI loops, not engineers with a bolt-on tool.
~03:02 — The 5-layer loop. Sensor (inputs from the world) → policy (rules + permissions) → tool layer (deterministic APIs) → quality gate (evals/human review) → learning mechanism that feeds failures back in. The loop must run with minimal human intervention to compound overnight.
~04:02 — Live YC example. A monitoring agent watches all YC employee queries, identifies failures, writes fix code, opens a PR, has an agent review/merge/deploy. The same failing query succeeds the next morning. The speaker calls this the "holy sh** moment."
"For me, that was like the holy [bleep] moment. That's not just AI making you 20 or 30% more valuable. It is the AI going through this loop to figure out how to self-improve." ~05:02
~07:03 — Tokens > headcount. YC demo-day companies average ~5× more revenue per employee than 18 months ago. Measure token usage directionally. Middle management is replaced by AI coordination; every role becomes an IC with a single DRI.
"Burn tokens, not headcount. We are seeing companies get to demo day with about 5× more revenue per employee than they did 18 months ago."
~08:03 — Make the org legible to AI. Record everything — emails, Slack, DMs, office hours. "If it is not recorded, it does not exist to the AI." YC regenerated its entire 150-page user manual from 2,000 hours of recorded office hours in one weekend; it now self-updates monthly. Business context is the durable asset; software is ephemeral.
OpenAI dropped a coordinated salvo of Codex features and a high-profile YC tie-in. Goals (via /goal) supports long-running tasks across app/IDE/CLI with the goal itself serving as both task prompt and completion criteria; Codex plugin sharing ships with a Shared-with-you tab and deep-link share URLs; and Appshots arrives as a new in-Codex artifact.[15]OpenAI — Codex Goals[16]OpenAI — Codex plugin sharing[17]OpenAI — Appshots Separately, OpenAI announced $2M of tokens (via an uncapped SAFE at Series A valuation) to every YC company in the Spring and Summer 2026 batches.[18]YC — OpenAI $2M for every batch company
~00:00 — The new /goal command activates a long-running task mode across the Codex app, IDE, and CLI.[15]OpenAI — Goals The goal itself is both the task prompt and the completion criteria. Codex can help users author goals via plan mode or an interview flow. Running goals support steering messages, non-interrupting side chats, and pause/resume. OpenAI cites a 100-hour single-goal run in the outro — a clear bid for the long-horizon territory that Anthropic's "Claude routines" has been targeting.
~00:00 — Outbound sharing to specific teammates or the whole workspace via a modal and share link, a "Shared with you" tab for inbound discovery, and deep-link share URLs for the curated plugin directory (demoed with a Slack plugin).[16]OpenAI — Plugin sharing
Codex now produces Appshots — a new artifact for shareable in-Codex snapshots.[17]OpenAI — Appshots
OpenAI is committing $2M in tokens (not cash) to every YC Spring 2026 and Summer 2026 batch company via an uncapped SAFE at Series A valuation.[18]YC — OpenAI $2M for YC batches The deal targets founders running agent-heavy "token-maxing" workflows. A special application window closes May 25, 2026, with decisions by June 5. Read in combination with the Anthropic ban story below, the timing is striking: OpenAI is buying ecosystem mind-share at exactly the moment Anthropic is restricting it.
Better Stack walks through Anthropic's multi-stage crackdown on third-party Claude harnesses — silent token blocking in January, a February ToS update, April enforcement that included scanning Git history for keywords like "Open Claude" and "Hermes" — and the new "programmatic credits" system that replaces it.[19]Better Stack — Anthropic third-party ban Credits are billed at full API rates, don't roll over, and require opt-in by June 15. The host's take: classic vendor lock-in, mirrored by OpenAI taking the opposite tack — including Codex in every ChatGPT subscription and offering enterprise Claude switchers two free months of Codex.
Subscription tiers come with API credits worth the subscription price — $20 Pro, $100 Max 5×, $200 Max 20× — but billed at full API rates, with no rollover.[19]Better Stack The host estimates $20 of Opus 4.7 usage burns in ~2 days; $200 may not last a heavy user a week. Opt-in required by June 15.
"Your $200 max subscription gives you $200 of API credits per month, which, if you're a heavy user, could be gone in a single afternoon."
"No announcement, no warning, just a silent update that broke workflows overnight."
"On the 6th of May, Anthropic signed a deal with SpaceX for over 220,000 GPUs. So, if compute was the problem, they just solved it."
OpenAI includes Codex in every ChatGPT subscription with no credit system, allows subscription use in third-party tools, opened its platform to Open Claude (3M users), and is offering enterprise Claude switchers two free months of Codex.[19]Better Stack The host's frame: Anthropic is putting up walls; OpenAI is tearing them down.
"While Anthropic is putting up walls, OpenAI is tearing them down."
"Anthropic is making up weird rules that is giving them free customers."
"The question now is whether the Claude models are still good enough to justify paying more to use them. Right now for me, the answer is yes, but the gap is closing very quickly."
claude -p, Open Claude, Conductor, Hermes, Sandcastle, T3 Code, Zed, Open Code, Codex, ChatGPT, Claude routines, Nano ClaudeBetter Stack reviews Routa, a free, open-source, local-first AI coding tool that treats AI-assisted development as a Kanban-based delivery pipeline — backlog → dev → review → evidence → done — rather than a chat session.[20]Better Stack — Routa Model-agnostic via your own API key (host used Claude), supports MCP and ACP, self-hostable via Docker Compose. The thesis: "the next step is not just smarter models, it's better coordination, better traces, better gates."
Routa targets three problems the host calls "chat hell" — context trapped in conversations, no traceability for AI decisions, and no quality gates (tests, diffs, acceptance criteria) enforced before merge.[20]Better Stack Stages are explicit: backlog → dev → review → evidence → done. Each work item has its own card with context, plan, diffs, and evidence — closer to how a delivery team operates than a chat thread.
It's model-agnostic via bring-your-own-API key, supports MCP and ACP agent protocols, and ships both a desktop app and a Docker Compose self-host. Comparisons the host makes:
"The next step is not just smarter models — it's better coordination, better traces, better gates."
Sequoia clips a 45-second pitch from Jake Stauch on Serval: keep traditional workflow and database primitives, but generate and update them via natural language. Describe a workflow (steps, permissions, approvals, logic) and the code is generated instantly; same pattern for data sources.[21]Sequoia — Serval
Stauch's pitch is that the building blocks (workflows, databases, permissions) are exactly the same as the past 20 years of internal-tool stacks — what changes is the authoring layer.[21]Sequoia Instead of clicking through a low-code UI, you describe the workflow in words and the code is generated and maintained for you. Same with data ingestion: describe the sources you want and Serval generates the fetch code and keeps it up to date.
Worth filing next to the Routa write-up above as another bet on opinionated, structured AI-assisted development — pulling away from chat-first interfaces.
Nate B Jones argues that traditional prompt engineering is now table stakes and no longer sufficient for Opus 4.7-class agents. The replacement: the "AI Question Method" — three principles for working with senior-partner-grade AI rather than over-instructing it like a junior hire.[22]Nate B Jones — Prompt Engineering Is Dead
~00:00 — The reframe. Stop treating AI like a junior employee that needs precise instructions; start treating it like a senior partner that benefits from open questions and directional intent.[22]Nate B Jones
~04:00 — Principle 1: Flashlight Intent. Frame questions with a central thesis (the beam center) plus explicit edges/exclusions. Avoids both over-open and over-prescriptive prompts.
~09:00 — Principle 2: Invite Synthesis. For complex creative outputs, pose multiple intersecting open-ended questions and let the AI synthesize across them — rather than writing rigid evals that constrain the answer. He claims this capability is meaningfully better in Opus 4.7 and 5.5.
~14:00 — Principle 3: Data + Opinion. When pointing the AI at a folder of files, explicitly name data artifacts alongside your thesis-as-question so it ranges across all sources rather than drilling into one.
~22:00 — Closing. Nate calls this a real inflection point and pushes for a full vocabulary shift from "prompt engineering" to "AI questioning."
"The words were never the things that mattered the most in prompt engineering. The intent was always what mattered."
AICodeKing publishes a tips guide for using Google's Antigravity 2.0 despite the host's stated dislike of it. The fixes — labeled "KingGravity" — center on plugging in Anthropic's front-end design skill plus awesome-design.md for UI quality, tuning the King mode prompt with "don't plan unless asked," enabling browser tools, and using the Karpathy skill.[23]AICodeKing — KingGravity
Four threads from the video:[23]AICodeKing
awesome-design.md for UI quality; add "don't plan unless asked" to the King mode prompt to keep responses structured.awesome-design.md, King mode prompt, VS Code, Karpathy skillA ~90-second short in which Nate B Jones — citing Andrej Karpathy — argues that AI writing detection in schools is mathematically impossible to implement correctly, and is already harming students.[24]Nate B Jones — AI Detection
The core claim: there is no reliable way to distinguish AI-generated prose from human prose at a per-document level, and false-positive rates are high enough that the policy generates real harm — students getting flagged and penalized for writing that they actually wrote.[24]Nate B Jones
A short clip from The Pragmatic Engineer: Alice Ryhl pitches Rust to TypeScript developers as a back-end language, emphasizing reliability and bug reduction for server-side workloads like API servers.[25]Pragmatic Engineer — Alice Ryhl
Ryhl frames Rust's appeal for TypeScript devs not in terms of performance but in terms of reliability — fewer production bugs on the server side, with the type system catching what JavaScript's wouldn't.[25]Pragmatic Engineer