Cursor ditches VS Code, Claude Code loses Theo

Industry Hot Take

Tech Brew

Tech Brew: Inside the Memos Behind OpenAI's Safety Retreat

Tech Brew surfaces a New Yorker investigation built on internal memos from Ilya Sutskever and Dario Amodei detailing how OpenAI systematically dismantled its safety commitments during the for-profit transition.^{[1]Tech Brew — Inside the memos behind OpenAI's safety retreat} The promised superalignment team got 1–2% of compute instead of the pledged 20% before being dissolved; Amodei pushed a "merger clause" that would force OpenAI to support a safer competitor, and Microsoft vetoed it. His internal conclusion: "The problem with OpenAI is Sam himself."

What the memos show

Superalignment underfunded. The team Sutskever and Jan Leike led was promised 20% of OpenAI's compute. In practice it received 1–2%, then was dissolved entirely.
Board captured. Independent directors were replaced with Altman allies after the November 2023 firing-and-return. The charter no longer functionally guides behavior.
Amodei's merger clause. Before leaving to start Anthropic, Amodei pushed for a contractual commitment that OpenAI would stop competing and support a rival closer to safe AGI. Microsoft used its veto to kill it.
Altman investigation. The board inquiry into Altman's conduct produced no written report.

That's not… a thing. — an OpenAI rep asked about existential safety researchers.

My vibes don't match a lot of the traditional AI-safety stuff. — Sam Altman, quoted in the memos.

The problem with OpenAI is Sam himself. — Dario Amodei, internal memo.

Why it lands now

The article reframes the 2023–2024 safety team exodus as less a voluntary departure and more a systematic hollowing-out documented in writing. It's also a useful backdrop for Theo's Claude Code video below: Anthropic exists partly because Amodei concluded he couldn't fix OpenAI from inside, which is why the current Anthropic self-inflicted usability wounds (Topic 3) hurt more — the company was supposed to be the one that didn't do this.

AI Future Industry

Import AI

Import AI 452: Cyberwar Scaling Laws and the GDP Forecasting Puzzle

Jack Clark's Import AI 452 lands three research items: cyberoffense capability now has a 5.7-month doubling time since 2024; MIT finds AI automation progresses as a "rising tide" across tasks simultaneously rather than sector-by-sector disruption; and a 560-person forecaster survey reveals economists assigning just a 14% probability to major short-term GDP bumps despite consensus on real capability gains.^{[2]Import AI 452 — Jack Clark}

Scaling laws for cyberwar (Lyptus Research)

Doubling time for AI cyberoffense capability: 9.8 months since 2019; accelerating to 5.7 months since 2024.
Frontier models hit 50% success on tasks a human expert would need 3.1–3.2 hours to complete — "roughly half a working day of professional offensive security work."
Open-weight models trail closed-source by about 5.7 months.
Methodology: 291 tasks across multiple benchmarks.

The best current models achieve 50% success on tasks that take human experts 3.2h, roughly half a working day of professional offensive security work.

If the 5.7-month doubling holds, tasks that take a human a full working week become 50%-solvable inside ~12 months.

Rising tides of automation (MIT)

MIT analyzed 3,000 tasks built from O-NET job families with 17,000 worker evaluations. They find AI progress doesn't strike specific sectors — it raises all boats at once. Frontier models moved from 50% success on 3–4 hour tasks in Q2 2024 to 1-week tasks in Q3 2025, with 80–95% success on most text-based labor tasks projected by 2029.

Progress typically resembles a rising tide, with widespread gains across many tasks simultaneously.

The GDP forecasting puzzle (FRI)

The Forecasting Research Institute surveyed 69 economists, 52 AI experts, 38 "accurate forecasters," and 401 general-public respondents between October 2025 and February 2026. The paradox: capability forecasts are bullish, but economists assign only ~14% probability to a major short-term GDP spike or wealth-inequality jump. By 2050, AI experts do expect multi-point GDP contributions — just not before 2030.

If everyone expects a continuation of trends, why are people freaking out?

Clark's implicit frame: if cyberoffense and labor automation both compound at observed doubling times, the macro forecasts may already be stale.

Mentioned: Lyptus Research, MIT, Forecasting Research Institute, O-NET

Hot Take AI Tools

Theo - t3.gg

Theo: Why Claude Code Is No Longer Usable

Theo films an Easter-day breakup video with Claude Code. The provocation: Anthropic now rejects API requests at the 400-level solely because the string "Open Claw" appears in the system prompt — and silently relaxes the rule if you enable paid overage billing.^{[3]Theo — Claude Code is unusable now} His conspiracy theory, which he labels as speculation: Anthropic also tightened the Claude Code system prompt so it refuses to help with non-software tasks, to stop people using claude -p as an Open Claw backdoor. Either way, his cc alias now points at Codex.

The Open Claw system-prompt ban, demonstrated

Theo runs a dead-simple Claude CLI call with the phrase "personal assistant running inside of Open Claw" in the system prompt (~00:45). The API rejects the request. He toggles "extra usage" (paid overages) in the Claude dashboard and re-runs the same request — now it succeeds (~09:15). Same payload, same headers, different billing status, different answer.

They put a special flag in their routing where you are built differently if you have the mention of Open Claw in your system prompt.

Simon Willison's public reaction: "Billing differently based on text contained in the system prompt is a really bad look." Dex, previously an Anthropic defender, reversed position: "If they're blocking use of Claude Agent SDK wholesale in Open Claw at a system prompt level, then this completely invalidates that argument."

The escalation ladder

Header-level bans. First Open Code, then Open Claw — requests rejected at the API if the client identifies itself.
Workaround: shell out to claude -p. Open Claw maintainers moved to calling the Claude CLI directly, which required appending their system prompt.
System-prompt string match. Anthropic added a filter for the literal string "Open Claw" in system prompts.
Paid-overage carveout. The filter flips off when the user has extra usage billing enabled.

Matt Pocock's unanswered questions

Theo reads Matt Pocock's list of Anthropic's ambiguous subscription rules (~10:40): Agent SDK in personal software? "OK-ish." Agent SDK in CI? Unknown. claude -p in open source on your own machine? Unknown. Pocock, who just released a paid Claude Code course, has been waiting over a month for an answer.

I have never before experienced from any developer tool such a frustrating lack of clarity over the basic terms of usage. — Matt Pocock

Anthropic subscription rules are more complicated than TypeScript generics. — Matt Pocock

The conspiracy: a muzzled system prompt

Theo's speculation (he labels it clearly): Anthropic has narrowed Claude Code's system prompt so the model refuses non-software tasks, to make claude -p a worse general-purpose backdoor. Evidence is anecdotal — he asked Claude Code for help debugging a macOS Dropbox menu bar issue and got three consecutive refusals (~15:00):

That's outside my area. I'm built for software engineering tasks like writing code, debugging, and working with repos.

Honesty caveat: Bad Logic Games, creator of Pi, tracks Claude Code's system prompt over time and told Theo there was no meaningful change (~19:40). Theo concedes this but maintains his subjective experience has sharply degraded, and suspects API-level prompt injection rather than the local system prompt file.

Where he's going

His cc alias, previously "Claude Code + custom flags," now runs codex --yolo (~21:00). His sandbox directory, cc-sandbox, keeps the name for nostalgia. He continues to use Claude models (via Codex and T3 Code's Agent SDK wrapper) but is done opening Claude Code directly outside of content production.

In December, Claude Code changed the way I build software and the way I think about software on a fundamental level… My last reasons for using Claude Code died in front of me this morning.

Tools: Claude Code, Codex, Open Claw, Open Code, T3 Code, Pi, Agent SDK, Clerk (sponsor)

Developer Tools Industry

Fireship

Cursor 3.0 Ditches VS Code, Ships Kimi-Based Composer 2

Cursor 3.0 is a full rewrite in Rust + TypeScript — the VS Code fork is gone.^{[4]Fireship — Cursor ditches VS Code} The new UI is designed around managing swarms of agents across repos, machines, and cloud servers rather than around editing code. Their new Composer 2 model, initially marketed as beating Opus 4.6, was outed as fine-tuned Moonshot Kimi K2 after someone found the base model ID in metadata — Cursor later apologized and published a technical report.

From air traffic controller, not captain

Fireship's framing (~00:00): Cursor 1.0 was autocomplete (co-pilot), 2.0 was chat-controls-terminal (captain), 3.0 is "air traffic controller" managing parallel agents. The home screen is an agent manager with colored status dots: yellow = needs permission, blue = done. The editor is still there; it's just no longer the primary surface.

Composer 2 — the Kimi K2 incident

Composer 2 launched ~2 weeks prior to 4/6 with benchmark claims exceeding Opus 4.6 at a fraction of the cost and latency (~01:02). Within days, a user found the underlying model ID in Composer metadata: Moonshot Kimi K2, heavily RL-finetuned. Cursor apologized for the transparency failure and released a full technical report on their RL pipeline.

Kimi itself has been accused of training on Claude's outputs because it will occasionally say, "Hi, I'm Claude."

The Rust rewrite

Fireship calls out the RAM story specifically — the old VS Code fork's memory footprint is gone. New interface combines professional-IDE affordances (language servers, files, remote SSH) with agent orchestration. Many have criticized it as too similar to OpenAI Codex.

Demo: Horse Tinder

Fireship builds a demo app ("Horse Tinder") live (~02:40). Plan mode drafts architecture; he spins a parallel agent on the landing page; another SSHes into a cloud server; another starts a separate project. 13,000 lines in a few minutes. The built-in browser renders the live app — he uses design mode to highlight-and-edit CSS in place, queueing fixes rather than waiting for each to complete.

Tools: Cursor 3.0, Cursor Composer 2, Moonshot Kimi K2, Claude Opus 4.6, OpenAI Codex, Rust, TypeScript, Blacksmith (sponsor)

Industry

Y Combinator

Y Combinator x BillionToOne: From Prenatal Tests to Stage-One Cancer Screening

YC's Gary Tan profiles BillionToOne — a $4B-valuation public company doing 600K+ tests/year at ~20% prenatal market share — and walks through the three-step plan (prenatal → late-stage cancer → early detection) that founders Ozan Alkan and David Tsao laid out before seed.^{[5]Y Combinator — BillionToOne Is Solving One of Biotech's Hardest Problems} The key technical trick: spiking synthetic DNA into samples before amplification so they can computationally subtract PCR noise. "Less than a year" from launching ultra-sensitive MRD testing for stage-one cancer patients.

The core insight: known spikes subtract PCR noise

Cell-free DNA in maternal blood or tumor plasma is dilute — "a few molecules among billions." Everyone amplifies via PCR, which adds noise that drowns the signal. BillionToOne adds synthetic DNA of known sequence and quantity before amplification; the distortion pattern on the known DNA lets them subtract distortion from the rest (~03:20).

That converts a difficult biology problem to almost a simple mathematical problem.

Why no one else did it first

Alkan's answer (~06:40): it requires an interdisciplinary bridge between wet-lab chemistry and bioinformatics. Most teams split these roles; Alkan and Tsao each do both.

From lab bench to public company

2017: YC application. Shared bench, $300K to raise, "took six months and was incredibly difficult."
6 months in: validated test on real samples.
Launch + 2 months: one physician sending 1–2 tests a week.
The sales turn (~08:10): Alkan forced a VP of sales to hire five reps in 3 weeks and train them over a weekend. Direct-to-patient marketing ("convince patients to demand the test from doctors") converted 1 in 5 leads.
2022: purpose-built lab in Union City.
Late 2025: IPO at $4B+.
2026: 600K+ tests/year, ~20% prenatal market share.

The lab, AI-accelerated

Sample intake used to take a human 60 seconds per tube; computer vision now does it automatically in their "Accessioning in 60 Seconds" project (~09:50). Throughput target: 2M tests/year from the single Union City facility — roughly 1 in 3 US babies. Liquid-handling robots with onboard optics pick off the plasma layer after centrifugation. Samples are barcode-pooled and sequenced together, then computationally demultiplexed.

The cancer-detection ladder

Step 1 (shipping): Prenatal — sickle cell, cystic fibrosis, thalassemia directly from maternal blood.
Step 2 (shipping since 2023): Late-stage cancer liquid biopsy (Northstar Select).
Step 3 (<1 year out): Minimal Residual Disease (MRD) testing for stage-1 cancer patients post-surgery — catch the ~20% of "cured" patients whose microscopic residue is invisible to scans.
Step 4 (eventual): Annual screening of healthy populations for stage-zero cancer.

Once we are there, I think technically we would have solved the holy grail of cancer detection.

Patient case study

A metastatic colorectal cancer patient in their 40s, heading to hospice, tested positive for microsatellite instability on BillionToOne's Northstar Select — a marker the tumor biopsy missed because the biopsy site didn't happen to contain it. Immunotherapy followed; "the cancer melting away" (~14:00).

Hiring philosophy

We're not looking to build an interdisciplinary team. We're looking for interdisciplinary people.

Research teams are tiny — one principal investigator with 2–3 research associates, reporting directly to Alkan and Tsao. "Many startups within the larger company." Pressure is a privilege; employees who could retire post-IPO are staying.

Being resource limited is sometimes very helpful. If you wanted to solve early detection from the very beginning without a step-by-step approach, you'd have to raise more than a billion dollars without generating a single dollar of revenue.

Mentioned: BillionToOne, Northstar Select, YC, PCR, cell-free DNA, MRD testing

Developer Tools

LearnThatStack

LearnThatStack: How HTTPS Actually Works

A ground-up explainer that corrects the common mental model of HTTPS (you don't just "encrypt with the server's key"). TLS solves three problems — confidentiality, integrity, authentication — and TLS 1.3 does it in one round-trip using ECDHE for key exchange with forward secrecy.^{[6]LearnThatStack — How HTTPS Works, It's Not Just Encryption}

The three problems TLS actually solves

Confidentiality. Can someone read the data in transit?
Integrity. Can someone modify it without detection? "Encrypting your bank transfer amount doesn't help if someone could change it after you sent it."
Authentication. Is the server actually who it claims to be? "Encryption without authentication isn't security. It's just privacy for the wrong party."

Why both symmetric and asymmetric

Symmetric (AES) is fast — modern CPUs have dedicated AES instructions — but requires a shared key the two parties have never met to exchange. Asymmetric (RSA, elliptic curve) solves key exchange but is orders of magnitude too slow for bulk data. TLS uses asymmetric to agree on a secret, then symmetric for everything after (~03:00).

ECDHE in the paint-mixing metaphor

Client and server start with a shared public color (yellow). Each picks a private color they never transmit. They mix private + yellow, send the mix. Each then mixes the received color with their private color — both land on the same final color. An eavesdropper sees yellow, orange, and green; cannot derive the shared secret (~06:00).

Security doesn't come from keeping the method secret. It comes from a mathematical property that makes reversing the operation computationally infeasible.

The "E" in ECDHE — ephemeral — means a fresh key pair per session. Forward secrecy: if the server's private key leaks years from now, recorded traffic from today stays secret.

Certificates, x509, and the fields that break production

A certificate contains subject, issuer, validity window, SANs, public key, CA signature, and key usage. In practice (~12:00):

SANs (Subject Alternative Names) — the field browsers actually use for domain validation. CN is legacy.
Validity — the "not after" date responsible for most midnight outage pages.
Key usage — a web-server cert cryptographically cannot sign other certs, so a compromised server can't be turned into a CA.

Self-signed dev certs: what actually fails

The video corrects a common confusion (~13:30): when Chrome rejects your localhost cert, encryption still works. Authentication is what fails — no CA is vouching. Recommendation: use mkcert, which installs a local CA into your system trust store and issues certs from it.

The padlock was always just the surface.

The closing prompt: run openssl s_client -showcerts on your own domain to check how many certs in the chain you're actually serving. If it's just one — no intermediate — mobile and non-browser clients are probably already erroring silently.

Tools: TLS 1.3, AES, RSA, ECDHE, HKDF, x509, mkcert, OpenSSL, SpiderMonkey

Hot Take AI Tools

Prefect

Prefect Stages a "Funeral for MCP" on April Fools'

Prefect posted a one-minute clip titled "Funeral for MCP | MCP is Dead" dated April 1, 2026 — the transcript is essentially captionless ("glad to have a lot of kindred spirits here in the room to to MCP to this community").^{[7]Prefect — Funeral for MCP (April 1st, 2026)} With no detailed transcript to rely on, read as an April Fools' conference-stage bit that Prefect uploaded with a delay. Title-only entry.

The video is a sub-2-minute clip from an in-person gathering, shot in audience-laughter style. YouTube's auto-captions on the file captured only a greeting fragment; no substantive content can be extracted without watching the video directly. The framing — "Funeral for MCP" dated April 1st — reads as satire poking at the MCP-is-dead discourse that surfaced around the launch of competing agent protocols (Skills, ACP) in Q1 2026. Treat as a vibes signal rather than a news item.

Mentioned: MCP, Prefect

Tech Brew: Inside the Memos Behind OpenAI's Safety Retreat

What the memos show

Why it lands now

Import AI 452: Cyberwar Scaling Laws and the GDP Forecasting Puzzle

Scaling laws for cyberwar (Lyptus Research)

Rising tides of automation (MIT)

The GDP forecasting puzzle (FRI)

Theo: Why Claude Code Is No Longer Usable

The Open Claw system-prompt ban, demonstrated

The escalation ladder

Matt Pocock's unanswered questions

The conspiracy: a muzzled system prompt

Where he's going

Cursor 3.0 Ditches VS Code, Ships Kimi-Based Composer 2

From air traffic controller, not captain

Composer 2 — the Kimi K2 incident

The Rust rewrite

Demo: Horse Tinder

Y Combinator x BillionToOne: From Prenatal Tests to Stage-One Cancer Screening

The core insight: known spikes subtract PCR noise

Why no one else did it first

From lab bench to public company

The lab, AI-accelerated

The cancer-detection ladder

Patient case study

Hiring philosophy

LearnThatStack: How HTTPS Actually Works

The three problems TLS actually solves

Why both symmetric and asymmetric

ECDHE in the paint-mixing metaphor

Certificates, x509, and the fields that break production

Self-signed dev certs: what actually fails

Prefect Stages a "Funeral for MCP" on April Fools'

Sources