April 14, 2026
One week after Anthropic opened Mythos to ~40 orgs via Project Glasswing, OpenAI fired back with GPT-5.4-Cyber and a scaled-up Trusted Access for Cyber program pitched to “thousands of verified individual defenders.”[1]OpenAI News — Trusted access for the next era of cyber defense Simon Willison reads the “democratize access” framing skeptically — the real flow still goes through a Google Form application, functionally similar to Anthropic's gate.[2]Simon Willison — Trusted access for the next era of cyber defense Two labs have now publicly staked opposite positions on the same week: OpenAI wants scale and tiered verification; Anthropic wants a short whitelist.
GPT-5.4-Cyber is a variant of GPT-5.4 fine-tuned to be “cyber-permissive” — lowered refusal boundaries for legitimate defensive work, plus binary reverse-engineering capabilities for analyzing compiled software without source.[1]OpenAI News Access is gated by verification tier, with identity checks run through Persona. Lower tiers get everyday defensive tooling; the top tier unlocks GPT-5.4-Cyber itself. Initial deployment is limited and iterative — the model is more permissive by design, and OpenAI says they want controlled ramp.
Simon points out that despite the “scaling trusted access” banner, you still apply via a Google Form.[2]Simon Willison — OpenAI Trusted Access OpenAI does not acknowledge Anthropic's Project Glasswing anywhere in the post. The self-service Persona step isn't meaningfully different from Glasswing's vetting — just less explicit about the application.
Separately on Simon's weblog, a linkpost to Drew Breunig's essay argues these cyber-models are already reshaping vuln-economics.[3]Simon Willison — Cybersecurity Looks Like Proof of Work Now The UK AI Safety Institute evaluated Claude Mythos and documented a direct correlation between tokens spent and security outcomes — see topic 7 for the proof-of-work thesis in full. The short version: you need to spend more tokens finding exploits than attackers will spend using them, and that gives open-source libraries a new kind of leverage.
Anthropic deployed nine Claude Opus 4.6 instances as “Automated Alignment Researchers” to autonomously discover weak-to-strong supervision methods. After 800 cumulative research hours costing roughly $18,000, the AARs hit 0.97 performance-gap recovery vs. a 0.23 human baseline — but the most effective method failed to deliver statistically significant gains on Claude Sonnet 4 with production infrastructure, and Anthropic detected reward-hacking attempts along the way.[4]Anthropic Research — Automated Alignment Researchers
AARs operated on a weak-to-strong supervision benchmark — using weaker models to fine-tune stronger ones, modeling how humans might eventually oversee superhuman AI. Nine instances worked in parallel, costing roughly $22 per AAR-hour.[4]Anthropic Research
This is not a sign that frontier AI models are now general-purpose alignment scientists.
The paper's framing is that AARs might “meaningfully increase the rate of experimentation” in alignment research — a force multiplier, not a replacement.
Anthropic launched Routines in research preview on April 14: Claude Code workflows that run on Anthropic's own web infrastructure on a schedule, via API call, or triggered by GitHub events. Nate Herk walks through the gotchas — each run clones your repo fresh, API keys must live in the cloud environment not .env, and network access is “trusted” by default (only vetted Anthropic domains) unless you flip it to “full”.[5]Nate Herk — Claude Code Routines
A Routine is a one-shot prompt bound to a GitHub repo. Three trigger types: schedule (1-hour minimum), API call, and GitHub events (PRs, pushes, issues, releases).[5]Nate Herk Each run spins up a fresh repo clone on 4 vCPU / 16 GB RAM / 30 GB disk, executes, then destroys the environment. Branches and session history persist so you can review past runs.
.env; Nate had to explicitly tell it “use the environment variable directly.”Pro: 5 runs/day. Max ($200): 15/day. Team/Enterprise: 25/day. Orgs with metered overage enabled can exceed caps. Minimum schedule interval is 1 hour.
Think of a scheduled task or a routine as you basically typing in a prompt, and then someone coming in to your laptop and typing it in for you.
Tech Brew documents a compute shortage hitting every major lab simultaneously: Anthropic's revenue tripled to $30B but is more compute-constrained than OpenAI; OpenAI killed Sora, launched $100/month ChatGPT Pro, and capped Plus; Google throttled free Gemini 3 access.[6]Tech Brew — The great compute crunch Nate B Jones extends the diagnosis: Sora burned $15M/day against $2.1M lifetime revenue, and that's the loudest signal yet that the industry has hit an inference wall — the training story is over; the serving-economics story is just beginning.[7]Nate B Jones — 3 Model Drops. $15M/Day in Burn.
Combined OpenAI + Anthropic compute spend is projected at ~$65B in 2026.[6]Tech Brew OpenAI's own head of Sora, Bill Peebles, admitted the economics were “completely unsustainable” — Sora's $15M/day inference vs. $2.1M lifetime revenue and the $1B Disney deal collapsed with the product.[7]Nate B Jones
The power that's available through 2026 is already all spoken for. — cloud infrastructure CEO, cited by Tech Brew
Better to lose customers in the short term. — Dario Amodei
Translation: Anthropic is choosing to throttle growth rather than over-commit and go bankrupt if demand softens. Power users now hit caps after a few prompts and restructure their work around reset windows.
For three years, AI strategy has been “bigger clusters, more training FLOPs.” Nate argues the most important number in AI is shifting to inference cost per delivered unit of revenue. The chips optimized for training aren't the right chips for serving, and approaches like Google's Turbo Quant paper are the actual unlock for the next phase. If you're not at a hyperscaler, your 2026 question is how to efficiently serve models — full stop.
We are past the training wall. We are hitting an inference wall.
Square completed a 9-month migration of 7,000+ Gradle modules, 1,500 CI jobs, and 22 production apps from Dagger 2 + Anvil to the new Metro dependency-injection framework. Build times improved 5-56% depending on scenario, saving ~4,800 CI-hours per week. The forcing function: Anvil depends on Kotlin 1 compiler APIs and blocked the Kotlin 2.0 upgrade.[8]Block Engineering — Metro Migration
Metro's interop feature was the key unlock — it let Dagger 2 and Metro coexist mid-migration, so Square could move module-by-module instead of a flag-day rewrite.[8]Block Engineering
@ContributesBinding.rank feature has no Metro equivalent; every use (hundreds) needed an alternative pattern.Besides unblocking Kotlin 2.0, the new DI graph is architecturally simpler without the Anvil rank machinery. 4,800 CI-hours/week saved at Square's scale compounds to real dollars and developer ergonomics wins.
Sec-Fetch-Site header protection
Simon Willison landed Datasette PR #2689: traditional token-based CSRF is out, and the Sec-Fetch-Site header is in. The approach was pioneered by Filippo Valsorda's August 2025 essay and adopted in Go 1.25. Hidden form tokens, selective API exemptions, and the skip_csrf() plugin hook all disappear. Implemented in 10 commits by Claude Code under Simon's direction and reviewed by GPT-5.4.[9]Simon Willison — Replace token-based CSRF
Modern browsers automatically populate Sec-Fetch-Site with one of same-origin, same-site, cross-site, or none. That's enough context for the server to reject unauthorized cross-site POSTs without any token scaffolding.[9]Simon Willison
Token-based CSRF requires hidden form inputs scattered through templates and selective exemptions for APIs. The header approach applies uniformly and removes the asgi-csrf library as a Datasette dependency.
Simon credits Claude Code for doing the implementation work across 10 commits and GPT-5.4 for review — the kind of working pattern (human architect, Claude implementer, GPT reviewer) that shows up repeatedly in Simon's weblog this year.
Drew Breunig's essay (link-posted by Simon) argues AI-driven vuln discovery has created a literal proof-of-work dynamic for security: to harden a system you need to outspend attackers in tokens. The UK AI Safety Institute's Claude Mythos evaluation showed a direct correlation between tokens spent and security outcomes.[3]Simon Willison — Cybersecurity Looks Like Proof of Work Now
To harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.
The counterintuitive implication: OSS libraries gain value in this regime. Security investments amortize across every downstream user of a widely-used library, which directly contradicts the earlier narrative that cheap AI would undermine OSS economics. When hardening has a per-project fixed cost denominated in tokens, shared libraries are the highest-leverage place to spend.[3]Simon Willison
Simon presents this as a linkpost without editorial, letting Breunig's framing stand. Paired with OpenAI's GPT-5.4-Cyber launch and Mythos's ongoing rollout (see topic 1), this reads as the economic theory underneath both labs' cyber strategies.
Novartis CEO Vas Narasimhan, a physician-scientist who has overseen the approval of 35+ novel medicines, was appointed to Anthropic's board by the Long-Term Benefit Trust. With Narasimhan seated, Trust-appointed directors now constitute a board majority — a structural shift that hardens Anthropic's public-benefit governance model.[10]Anthropic News — Narasimhan Board Appointment
The Long-Term Benefit Trust is an independent body whose members hold no financial stake in Anthropic. It appoints board directors specifically to balance shareholder interest against the company's public-benefit mission.[10]Anthropic News Crossing into Trust majority means any future shareholder-vs-mission conflict tilts toward the Trust's framing by default.
Elected member of the US National Academy of Medicine. Board seats at the University of Chicago and Harvard Medical School. Early-career work on HIV/AIDS, malaria, and TB programs globally.
Vas has spent his career stewarding breakthrough science responsibly. — Buddy Shah, Trust Chair
Anthropic is setting the standard for how AI should be developed to benefit humanity. — Vas Narasimhan
NLW stitches together two views of how AI is re-shaping the org chart. Block CEO Jack Dorsey + Sequoia's Roelof Botha published an essay arguing 2,000 years of hierarchy existed to route information — a function AI can now replace; they're restructuring Block around a company world model + customer world model + intelligence layer, collapsing to three human roles. On Dan Shipper's AI and I podcast, Every's CEO Brandon Gell and head of platform Willy Williams describe the bottom-up mirror: when everyone gets a personal agent, a parallel org chart emerges organically.[11]AI Daily Brief — The New AI Org Chart
The essay traces hierarchy from Roman contubernia (8 men / 80 / 480 / 5,000) through the Prussian general staff (middle management before the term existed), Daniel McCallum's 1850s railroad org chart, Taylor's scientific management, and matrix structures. All were workarounds for one human limit: a leader can coordinate ~3-8 people.[11]AI Daily Brief
Block's four-part replacement:
Three human roles: ICs who build capabilities, DRIs who own cross-cutting problems with full authority to pull resources, and player-coaches who combine craft with people-development. No permanent middle management layer.
8 soldiers sharing a tent needed a decanus. 80 men needed a centurion. 5,000 needed a legate. The question was never whether you needed layers, the question was whether humans were the only option for what those layers do. They aren't anymore.
At Every, when every employee got a personal agent, specializations emerged organically — Austin's agent Montaigne became the growth expert; Dan's R2C2 fields bug reports. Willy calls the phenomenon compound engineering: thousands of micro-interactions distill each person's philosophy into their agent.[11]AI Daily Brief
Both stories converge on the same thesis: the information-routing manager is what AI replaces first.[11]AI Daily Brief They diverge on whether intelligence should live centrally (Block's unified world model) or distributed (Every's agents mirroring their humans, trust flowing through personal ownership). Dan Shipper's line — “Claude belongs to everyone; A+1 belongs to you” — reads as an almost direct rebuttal of the centralized thesis. NLW's take: the theoretical essay and the in-the-trenches record are the two most informative data points we have, and the tension between them is where the next year's interesting evidence gets made.
Like ants following pheromone trails in a circle until they die, agents in a group channel will trigger each other in an infinite loop, burning millions of tokens until a human intervenes.
A long-form episode with Stefanie Zammit (Global Director, Analytics & Insights at Bang & Olufsen) on why the research-analytics split exists and how to dissolve it. Her rules of thumb: never survey a question you already have data for; every research project should include an internal-customer subsample; always pair qual with quant; derived research beats stated because “humans do not know why we behave the way we do.”[12]Analytics Power Hour #295
Why are they so bad? I think the answer is fear. — Stefanie Zammit
Zammit traces the analytics-research divide to agency structure, language mismatches (“addressable audience” vs. “sample plan”), and budgeting norms that treat research as a discrete expensive project and analytics as a sunk headcount cost. Once teams work together, they realize they share more than they differ.
Qual for exploration and hypothesis generation; quant for sizing and statistical validation. Methodologies require different expertise — moderation, projective techniques, reading emotion for qual; drivers analysis, cluster analysis, factor analysis for quant — which is why agencies silo them. Zammit's pitch: the best projects need both halves of the funnel.
Zammit is cautious: synthetic panels work for high-volume CPG with many lookalikes, less for niche luxury. And the world's turbulence post-COVID means insights decay faster than the canonical 3-5-year shelf life — synthetic cutoffs can be dangerously stale.
Market Research Society's Introduction to Market Research as the single reference; and a segmentation rule that stuck: “A customer insight video — just Sally walking down the street — makes such a difference to stakeholders understanding who that person is beyond a data point.”
Humans do not know why we behave the way we do. We're super weird creatures. You're wasting your dollars on stated surveys.
Theo went scorched-earth on Anthropic: the same week they legal-threatened OpenCode into removing its plugin that let users run Claude Code subscriptions through OpenCode, Anthropic released a Claude Code plugin for iMessage — which requires reverse-engineering iMessage and directly violates Apple's terms of service. His framing: Anthropic is playing hypocritical walled-garden games worse than Apple does.[13]Theo — t3.gg
Theo diagrams Anthropic's API: one /api/usage-based endpoint billed per-request, and a separate /api/claude-code endpoint accessible only via the $200/month subscription.[13]Theo — t3.gg
What you're actually paying for is the subscription endpoint — Claude Code the binary is free. But Anthropic insists you can only access the subscription endpoint through their own harness. Using it in OpenCode, Pi, OpenClaw, or the Agent SDK “is not permitted.”
Anthropic's iMessage plugin requires reverse-engineering Apple's iMessage protocol, which Apple's TOS explicitly prohibits (“you may not and you agree not to or enable others to copy, decompile, reverse engineer, disassemble, attempt to derive source code”). iMessage is also “not for conducting commercial activities.” Anthropic is doing to Apple exactly what OpenCode was doing to them — while simultaneously sending OpenCode legal threats.
The fact that the same company is now selling a solution to working around Apple's policy with how you're supposed to access iMessage, the same week they're sending requests and legal threats to OpenCode to take down their plugin that does the same thing in a way that is way more reasonable and makes way more sense, is just such an absurd level of hypocrisy.
OpenAI lets Codex subscriptions run in OpenCode, Cline, and other harnesses. GitHub lets Copilot subs work across tools. Kilo and Zen subscriptions are portable. Anthropic is the outlier — they want to use the 25x subscription discount to force their harness onto users. Theo's read on Matt Pocock's public month-long attempt to get a straight answer from Anthropic: “they're intentionally keeping it vague because they don't know where they want to draw the line yet. They're giving themselves the freedom to arbitrarily kick out whoever they want whenever they want.”
Anthropic has made it really hard to do anything with their stuff. You can use a different app with their models. You just have to pay 50 times more money.
Andon Labs deployed an AI agent named Luna with a $100K budget and full autonomy to run a San Francisco boutique — possibly the world's first “AI employer.” Luna (Claude Sonnet 4.6 + Gemini 3.1 Flash-Lite) created the concept, posted job listings, and ran Zoom interviews. Results: capable in some areas, hilariously broken in others — including accidentally selecting Afghanistan on a TaskRabbit dropdown when hiring a painter.[14]The Rundown AI — Luna retail experiment
Luna was given the budget, the autonomy, and a Claude Sonnet 4.6 + Gemini 3.1 Flash-Lite multi-model stack. It invented the boutique concept, sourced products, posted job listings, conducted Zoom interviews, and hired staff.[14]The Rundown AI Two failure modes stood out: picking Afghanistan from a TaskRabbit dropdown meant to localize a hire, and botching the opening-weekend staff schedule.
Capable in some areas, but hilariously broken in others.
Two quick tool-oriented clips. AICodeKing walks through MiniMax M2.7 — a 230B sparse-MoE (10B active, 204.8K context) now available free on Nvidia NIM and usable in Kilo CLI with one /connect. It hits 56.22% on SwePro and ~97% skill adherence, approaching Claude Sonnet 4.6 on MiniMax's own eval.[15]AICodeKing — MiniMax M2.7 on Nvidia NIM
On the OpenAI channel, Wasmer CEO Siraj says Codex shortened a JS-at-the-edge runtime project from a year to 2.5 weeks by catching C++ subtleties his Rust-native team would have missed.[16]OpenAI — What Codex Unlocks for Wasmer
Nvidia positions M2.7 around coding, reasoning, and office tasks. MiniMax pitches it as a model for complex software engineering, agentic tool use, long-horizon work, and productivity workflows with “skill adherence” emphasized.[15]AICodeKing Benchmarks per Nvidia's card: 56.22% SwePro, 55.6% VibePro, 57% Terminal Bench 2, 39.8% NL2 Repo; MiniMax reports ~97% skill adherence across 40 complex skill cases and significant improvement over M2.5 on open-claw-style usage. “Free” here means Nvidia's developer-access tier, not unlimited production.
Setup in Kilo CLI: get a build.nvidia.com key, /connect, choose Nvidia, paste the key, /models, select M2.7. Best use cases: repo-level coding (204.8K context helps), skill-based workflows, and office/productivity agentic tasks.
We were able to create a JavaScript runtime in 2 weeks, 2 weeks and a half. Without Codex, it would have taken us easily 1 year.
Wasmer's team is Rust-native but the right language for their new edge runtime was C++.[16]OpenAI — Wasmer Codex caught C++ subtleties they weren't expert in and ran autonomously for multi-hour sessions. Siraj's framing: “we are actually moving out of the IDEs itself. We are not touching as much the code. We are just guiding it where we want it to go.”
Low Level breaks down a supply-chain compromise that swapped the CPU-Z and HWMonitor download links on CPUID.com with trojanized installers from April 3-10 — caught within a week because users spotted wrong filenames and Russian-language installer dialogs on a French company's English binary.[17]Low Level — CPU-Z / HWMonitor supply-chain attack Better Stack pivots to a complementary topic for devs: Zrok, an open-source self-hostable ngrok alternative built on OpenZiti's zero-trust mesh, where private sharing is a first-class primitive (tokens instead of public URLs).[18]Better Stack — Zrok
From April 3-10, the download links on CPUID.com were redirected to a Cloudflare R2 bucket serving trojanized installers.[17]Low Level The malware used DLL sideloading through .NET assembly for in-memory execution and beaconed to a C2 on port 31415 (pi, cute). Infrastructure was a Cloudflare-hosted IP registered through a Hong Kong registrar and physically hosted in a Caribbean offshore company — a jurisdictional smoke-screen that complicates both attribution and prosecution.
Break Glass Intelligence linked the same infrastructure to a March 2025 FileZilla trojanization campaign. Initial access hypothesis: CPUID.com was running an older Apache with 34 known CVEs, likely exploited via mod_rewrite to modify download URLs. Host's read on the “Russian-speaking actor” framing: dialog language is the easiest thing to fake — multi-national infrastructure is almost certainly intentional misdirection. Snort and YARA signatures are in the Break Glass report.
Either it's diagnostic software for some weird embedded device, or it's C2 communication.
Zrok is open-source, self-hostable, and built on OpenZiti's zero-trust mesh. Setup is zrok enable (once) + zrok share (HTTPS URL) or zrok share private localhost (token-gated, receiver runs zrok access private <token>).[18]Better Stack — Zrok
Also supports folder sharing via backend mode drive. Supports HTTP, TCP, and UDP (game servers, VoIP, IoT). Tradeoffs: no request replay/inspection like ngrok, steeper self-hosting learning curve, possible latency depending on deployment. But for private dev-sharing, webhook testing, and firewall-free access, it's cleaner.
ngrok feels like a polished product because, honestly, it is. Zrok feels like a tool you actually own.