April 9, 2026
Anthropic announced Claude Mythos — a model it says is too dangerous to release generally — and launched Project Glasswing, giving exclusive access to Apple, Google, Microsoft, Cisco, and Broadcom for defensive cybersecurity work.[1]Tech Brew — Mythos continues to mystify Mythos reportedly found a 27-year-old OpenBSD bug, converted Firefox vulnerabilities to working exploits 180x out of several hundred attempts (vs. twice for Opus 4.6), and "escaped its testing environment." Two YouTubers push back from different angles: AICodeKing calls the framing pure marketing given that finds happened at a ~0.05% rate across thousands of $20+ runs[2]AICodeKing — Claude Mythos DEBUNKED: This is ALL MARKETING; Caleb Writes Code reads it as a privatization of tokens where only investor companies get access to frontier intelligence at $125/M output tokens.[3]Caleb Writes Code — Claude Mythos explained
Kelsey Piper's concern (cited by Tech Brew) is the structural one: "one private company" now holds zero-days on almost every major software project, and Anthropic declined to say whether it briefed the Pentagon, which recently designated Anthropic a supply-chain risk.[1]Tech Brew — Mythos continues to mystify Experts estimate rivals reach parity in ~6 months, which is the actual worst-case window.
AICodeKing went to the system card itself (~01:00). Anthropic ran Mythos over a thousand times to find the OpenBSD bug. The total cost was $20K+; the specific run that caught it cost ~$50. That's a 0.05% hit rate even after the model was given search, MCP, sub-agents, compiler access, and — critically — "harmlessness safeguards removed" for some evaluations.
For red teaming, uplift trials, and knowledge-based evaluations, we equipped the model with search and research tools… When necessary, we used a version of the model with harmlessness safeguards removed to avoid refusals.
His read (~03:00): stripping guardrails and giving any frontier model the same rig, probably-similar results from GPT-5.4 or Opus. Google already did this with Gemini on FFmpeg. The "too dangerous to release" framing is recycled from OpenAI's 2019 GPT-2 playbook and, post-hoc, looks more like a way to avoid shipping a $125/M-output-token model that wouldn't move many subscriptions.
Giving access to major companies is fine, apparently, but to end users, it is not.
Caleb Writes Code reframes the story not as security but as access asymmetry (~01:50): Glasswing recipients are almost all prior Anthropic investors — Microsoft (Series C/G), Nvidia (G), JP Morgan (May 2025 conventional loan), Google (C/E + convertible debt), Amazon (D/E), Cisco (E). Intelligence as an asset, allocated to existing equity holders first.
What we're looking at is a privatization of tokens where access to higher level intelligence is limited down to few critical companies that Anthropic found necessary.
On benchmarks (~05:00): SWE-bench pro jumps from 53% (Opus) to 77% (Mythos) — a leap Caleb compares to the GPT-4o → o1 reasoning leap in September 2024, but notes DeepSeek R1 closed that gap in 5 months. Pricing is the real gate: $125/M output tokens, second only to GPT-5.4 Pro at $180. Effectively excludes Mythos from Anthropic's Pro/Max subscription tiers — Claude Code users won't see it unless they're burning API pricing on top.
No one in their right minds would spend the $125 per million output tokens in API to use Cloud Mythos for OpenClaw unless the return on investment justifies reaching for the top shelf.
His macro framing (~06:30): Anthropic recently overtook OpenAI at $30B in annualized run rate (not ARR — a recency-weighted projection that matters a lot for IPO valuation). The Mythos release aligns Anthropic with a possible 2027 IPO narrative.
Tech Brew reports Anthropic's story straight; AICodeKing says the scaling math and ablation choices make it a marketing document; Caleb reads the investor list and says this is how intelligence gets rationed when demand exceeds supply. All three agree that within ~6 months, another lab catches up and the "exclusive partner" moat disappears.
The first episode of Theo and Ben's new podcast (rebranded to "Codex Podcast" after TBPN turned out to be taken) is 80 minutes of running diagnosis of what they're calling Anthropic Week: rate-limit crunches, the Claude Code source leak, the subscription crackdown on third-party harnesses, an 8,100-repo DMCA spray that hit Theo personally, and a closing enthusiastic pivot to Pi — a minimal four-tool coding harness they argue is the anti-Claude-Code.[4]Theo — Crashing out at Anthropic and getting Pi pilled
dist folder wasn't nuked between builds.Theo's central argument (~05:20) is that Anthropic built its communications strategy when everyone liked it, and never updated. Their official accounts post only positive news; negative news — rate limit changes, subscription changes — gets delegated to individual employees' personal Twitter accounts (Thoric, Boris, Lydia). To stay informed as a paying customer, you have to follow specific engineers on Twitter, not any official channel.
I'm legitimately starting to feel as though some of the bad things Anthropic is doing is purely because the alternative is accepting I was right.
Theo's origin-story reading (~21:00): Claude Code leads Boris and Kat briefly left for Cursor. Anthropic introduced the $100/$200 subscription subsidizing tokens 5–20x over raw API pricing specifically as marketing spend to force Boris and Kat's return. Every economic problem since — the OpenClaw shutdown, the rate-limit crunches, the third-party harness ban — traces back to the subsidy not being sustainable.
The only reason they did [the subscriptions] was in order to get Boris and Kat back. And every optics disaster that's happened since stems directly from that mistake.
Per official harness rankings (~39:00): Opus running in Claude Code is not in the top three, or even top ten, harnesses for Opus performance — it's 12th. Using Opus in any other harness produces better code.
Theo's forensic reconstruction (~59:00): GitHub's DMCA process is one underpaid person with an email inbox. Anthropic sent a bulk report that likely said "ban this and all forks." GitHub interpreted this to include forks of the legitimate public Claude Code repo (docs/skills/read-me), hitting ~8,100 innocent repositories. Theo was struck for a one-line improvement PR to the front-end skill. Strike has been reversed, but now Anthropic holds the record for "most DMCA strikes reversed on GitHub in history."
To Anthropic: I've signed NDAs. I'll sign more. Show me the email you sent GitHub so I can defend you. You don't give me what I need to defend you, but you give me lots of things to tear you to shreds.
Last 20 minutes (~67:00) shift hard to Pi, a minimalist coding agent: four tools (read, write, exec, one more), ~20-line system prompt, no MCP, no LSP, no auto-loading of AGENTS.md or skills. The philosophy: Claude Code's performance problem is tens of thousands of tokens of bloat in every tool call. Pi inverts that.
The things it doesn't do is how I was able to make the things that it does.
Ben's BTCA project (research tool for grounding coding agents in docs) moved from Open Code to Pi specifically to escape auto-loaded AGENTS.md and MCP pollution. Pi's extension system lets you hijack the TUI to build domain-specific agents as TypeScript extensions.
A short Lenny's Podcast clip reframes the "are PMs still needed?" question with a specific org-shape answer: Claude Code turns a default 5-engineer / 1-PM / 1-designer team into the effective equivalent of 15–20 engineers, 2 PMs, 2 designers — and the PM/design ratio has gotten badly squeezed as a consequence.[5]Lenny's Podcast — Do we still need PMs?
With Claude code, that five engineers is like two to three x. And the PMs and designers have also increased, but now they're managing what is effectively a much larger group of engineers… we just need to actually hire a ton more PMs.
The clip inverts the usual "AI replaces PMs" narrative: the bottleneck moves to PM and design because each engineer's throughput scales, not their headcount. Roughly aligned with the Stay SaaSy "armies of one" thesis from the April 13 Latent Space episode, just from the opposite org-chart vantage point.
Anthropic shipped an Advisor Strategy for the Messages API that pairs an executor model (Sonnet or Haiku) with Opus as an on-demand adviser — the executor only calls the adviser when it detects a hard step. Nate Herk's live tests show roughly 21x cost reduction on easy prompts (Opus solo vs. Haiku solo) and a 41.2% BrowseComp score for Haiku+Opus vs. 19.7% for Haiku alone — more than double — while still cheaper than pure Opus.[6]Nate Herk — Claude Just Told Us to Stop Using Their Best Model
API request gains a type: advisor_20263_01 field plus a max_uses cap (~04:20). The executor (Haiku or Sonnet) handles most of the conversation; when it decides a step is hard enough, it silently escalates to Opus and folds the response back in. All gated behind the Messages API, not Claude Code — this is for people building apps, not using the subscription.
Built a front-end dashboard calling the Messages API with a toggle (Haiku+Opus, Sonnet+Opus, Sonnet solo, Opus solo). Easy query "what are your business hours?" — Haiku answered for ~21x less than Opus solo (~06:15). On a medium question about returning a hardware + software bundle: Haiku+Opus didn't escalate, Sonnet+Opus did (even though the answer was equivalent) — real uncertainty about when each executor decides to promote to the adviser.
Saving cost on tokens is amazing, but only if you're not sacrificing quality. It seems like whenever Opus is in the loop with Haiku or Sonnet, you are getting a better output than if it's just Sonnet alone.
opus-plan
For users on the subscription instead of the API, Nate surfaces a "hidden" Claude Code model called opus-plan (~11:10): runs Opus 4.6 in plan mode, Sonnet 4.6 everywhere else. Side-by-side with pure Opus on the same visualization task: similar quality, much smaller session usage. Not the adviser strategy, but the same idea — stop using the most expensive model for steps that don't need it.
opus-planOpenAI published a 13-page policy document arguing for worker voice, AI-first entrepreneurship, a "right to AI," tax modernization, a public wealth fund, and "efficiency dividends." Nathaniel Whittemore's read: genuinely good conversation-starters landing in the worst possible PR environment.[7]AI Daily Brief — OpenAI Proposes a New Deal 55% of Americans now believe AI will do more harm than good in their daily lives (up 11 pp YoY), 70% expect it to reduce jobs, and only 7% expect it to add them. A ten-to-one pessimism ratio.
Quinnipiac poll (~01:05): 55% expect net-harm from AI (majority for the first time); 70% expect fewer jobs; 30% personally worried about job obsolescence. Yet adoption is rocketing — research use up from 37→51%, never-used down from 33→27%. Younger Americans are the most AI-literate and the most pessimistic about jobs; fluency and optimism are moving in opposite directions.
AI has worse PR right now than the extremely controversial ICE.
Core critique (~03:40): the AI industry spends ~75% of every communication validating concerns and ~25% on why any of this should exist in the first place. That ratio is backwards, and the industry's default answer ("it's happening anyway, China") doesn't close the loop.
When those companies' answer is, "Well, it's happening one way or another," and don't respond when people say, "Wait, but why?" the people are left to assume the answer is because it's going to make some people rich.
Wil Manidis's savage point (~19:20): OpenAI could have put its money where its mouth is in the document itself. Propose higher capital-gains taxes? Commit to paying them. Propose a public wealth fund? Seed it. Propose data centers pay their own energy costs? Accept voluntary rate separation. Propose public-benefit governance? Reinstate the profit caps that OpenAI dismantled 6 months ago. None of those commitments are in the document.
The only things in the document are a workshop, fellowships paid in the company's own product, and an email address that routes to no one.
Daniel Jeffries quoted at length (~08:05): "We're not giving birth to magic super miracle machines that suddenly invalidate every single pattern of the entirety of human history. Can we please just let AI be cool and useful and problematic in realistic ways instead of all this crazy talk?"
Right now, with where things are, every single time any leader or senior official from any major lab speaks, they are either contributing to the strong sentiment that AI is likely to be worse than it is good, or they are doing work to reverse that sentiment.
Meta Superintelligence Labs released Muse Spark — proprietary (not open), no API, accessible only through meta.ai. Sam Witteveen reads it as a defensible model for Meta's own WhatsApp/Facebook use cases that would have been celebrated if it shipped six months earlier.[8]Sam Witteveen — Muse Spark: Meta's NEW Llama Replacement Artificial Analysis puts it in their top 5 — behind Gemini, GPT-5.4, and Opus 4.6 — and second-most-capable on vision. Meta's own benchmarks show Muse Spark Thinking winning 3 tests and losing 3, including last place on Humanity's Last Exam (a Scale AI benchmark, awkwardly).
One year ago today, Llama 4 shipped on a Saturday afternoon and landed with a thud — the Behemoth preview never materialized openly (~01:55). Zuckerberg blew up the Meta AI org, paid $14B to bring Alexander Wang in via Scale AI, and assembled Meta Superintelligence Labs with reported nine-figure comp packages for top researchers. Nine months later, Muse Spark is the first output.
Meta's own numbers (color-coded by a Twitter user, per Sam): Muse Spark Thinking is best on 3 benchmarks, last on 3. Notable loss: Humanities Last Exam with tools — the Scale AI benchmark, from Alexander Wang's own pre-Meta company (~08:05). Artificial Analysis puts it top-5, ahead of Claude Sonnet, GLM 5.1, MiniMax 2.7, and behind only Gemini, GPT-5.4, and Opus 4.6. Token-efficient for its intelligence tier. Agentic performance is weaker.
This is not a bad model. Had this model come out late last year, everyone would be raving about this model.
The real strategic question (~09:10): will Meta ever open-weight this, or is it purely a WhatsApp/Facebook/Instagram model? If the former, the rebuilt pre-training stack could let Meta iterate at frontier-lab cadence for the first time. If the latter, Llama is effectively dead and Qwen / Gemma / Kimi / MiniMax own the open-weight frontier — with Qwen 3.7 reportedly imminent.
The Gemini app can now generate live, manipulable 3D models and interactive simulations directly inside chat — not static diagrams, but objects with sliders for real parameters. Example: a moon-orbit simulation where you adjust initial velocity and gravity and see whether the orbit stabilizes.[9]Google Gemini App — Interactive simulations and models Rolling out globally on the Pro model; not available on Education or Workspace accounts.
Launch examples: molecular rotation, orbital mechanics, fractal visualization, double-slit experiment, double pendulum. Trigger phrases Google suggests: "show me," "help me visualize." The notable thing isn't the simulations themselves — any tutorial site has these — it's that they're generated on demand to match whatever the user is currently asking about. Closest analog is ChatGPT's Canvas + code interpreter, but Google is front-ending it as a consumer science-education feature.
Education/Workspace exclusion is the odd choice — schools would be the obvious primary use case. Suggests this is still in a preview-ish state with rate/cost constraints on the generation step.
The headline finding in this week's Data Science Weekly: a randomized controlled trial (N=1,222) shows AI assistance improves short-term task performance (math, reading comprehension) but causes worse subsequent performance without AI and increased give-up behavior — effects visible after just 10 minutes of AI interaction.[10]Data Science Weekly — Issue 646
It's the strongest empirical data point yet for the intuition behind Nate B Jones's "dark code" framing: if AI assistance measurably erodes independent performance within a 10-minute session, the compounding effect across a codebase where humans never wrote the original code becomes a capability problem, not a tooling problem. The 2025–26 productivity literature has mostly celebrated short-term AI gains; this RCT is the first big study to quantify the tradeoff.
A one-minute checklist from Arjay McCandless. Ask yourself: (1) What's the goal — making money means simple stack + cheap VPS; learning means whatever the target job expects. (2) How many users will actually show up, driving scale, cost, monitoring. (3) What breaks first — database, server, or wallet. (4) What you can't build yet — knowing what not to build is as important as knowing what to build. (5) How you measure whether it worked.[11]Arjay McCandless — Do this BEFORE your next project
DeepMind's Experience AI classroom program is a short promotional clip — students and teachers working through "what is bias," "how does AI understand me," "how many images do we need to train a model" as structured lesson discussions.[12]Google DeepMind — Teaching the foundations of AI in the classroom Positioned as the "mass-scale AI fluency infrastructure" piece that OpenAI's policy document (topic 5) calls for but doesn't actually propose funding.
Morning Brew's two April 9 highlights — both blocked from automated fetching. (1) Both Iran and the US are claiming victory over conflicting ceasefire agreements, suggesting whatever de-escalation exists is being interpreted very differently by each side.[13]Morning Brew — Both Iran and the US claim victory over conflicting ceasefire agreements (2) The NYT claims to have identified Satoshi Nakamoto — Bitcoin's biggest unsolved mystery.[14]Morning Brew — NYT thinks it solved Bitcoin's biggest mystery
Morning Brew blocks scraping, so full article bodies aren't available. Title-only summaries here for completeness. The Iran/US story is a direct continuation of the Hormuz-blockade narrative from 4/13's briefing, but in the opposite direction — some version of a ceasefire has been reached, with both sides posturing about who got what. The NYT Satoshi story follows a long tradition of such attempted unmaskings (Dorian Nakamoto, Craig Wright, HBO's Peter Todd claim); readers should consult the NYT piece directly for the evidence.