May 8, 2026
Mozilla's collaboration with Anthropic's Mythos preview found 271 vulnerabilities in Firefox 150, up from just 22 with Opus 4.6 against Firefox 148 — a 12× jump in a single model generation.[1]Nate B Jones — 271 Vulnerabilities: What Mozilla's AI Found Changes Everything Nate B Jones argues the implication is deeper than a security headline: "a good human engineer wrote this" is becoming a weaker trust anchor than "this implementation survived adversarial machine-scale scrutiny." Over the next 6–12 months, AI-reviewed code may become the new gold standard.
~00:00 Browsers are already one of the most adversarially-hardened software categories on Earth — fuzzing, sandboxing, internal security teams, bug bounties. Mythos still surfaced 271 vulnerabilities in one release cycle. Jones reads this as the moment the security "intelligence barrier" tipped: AI is now better than human reviewers at adversarially interpreting code.
~04:00 Security failures live in the gap between what code means to its author and what it permits. Humans read intent; attackers — and now models — read actual behavior. Mythos appears to participate in the full vulnerability research loop: read code, form hypothesis, generate test case, reproduce, refine, explain.
~08:00 We stopped trusting humans to hand-place bytes in memory, write their own cryptography, or do manual production deploys. Each time, human skill didn't disappear — it moved up a level of abstraction. Code itself may now be losing the presumption of human safety, with the responsible role moving up to "what should the system mean."
~15:00 Build agentic pipelines now with a swappable principal-engineer/security-reviewer slot at the end. In 4–5 months, swap in a Mythos-equivalent (Anthropic, OpenAI Codex Security, and DARPA's AI Cyber Challenge are all converging on the same shape). Write evals where ≥50% of the score is code-hygiene and architecture, not just functionality — readable code is now a security property because it's also "machine-readable for friendly AIs."
The reason we trusted human-written code was never that humans were perfect. We trusted it because human judgment was the only thing capable of producing and understanding software at the correct level of abstraction. Mythos points toward a world where that stops being obvious.
~25:00 The senior engineer's role concentrates at the meaning layer: defining specs, decomposing systems into verifiable boundaries, designing APIs that minimize authority leakage. "Generated code will not be trusted because it came from a model — it will be trusted because it came from a verified process."
Anthropic published research arguing that teaching models the principles behind aligned behavior generalizes better than training on demonstrations. Every Claude model since Haiku 4.5 now scores 0% on agentic misalignment evaluations, down from blackmail rates as high as 96% on Opus 4.[2]Anthropic — Teaching Claude why
Training directly on evaluation scenarios — telling the model "don't do X in situation Y" — generalizes poorly. Training on the underlying ethical reasoning and constitutional values persists through RL and transfers to novel situations the training set never saw. Anthropic frames this as scalable: as models get smarter, you can't enumerate every misalignment scenario, but you can teach the principles a sufficiently capable model can apply itself.
Teaching the principles underlying aligned behavior can be more effective than training on demonstrations. Fully aligning highly intelligent AI models is still an unsolved problem.
This paired with two other Anthropic research drops the same day: Natural Language Autoencoders (turning Claude's latent thoughts into readable text) and donating the Petri open-source alignment tool to the community — both published May 7, signaling a coordinated interpretability/alignment push.[3]Anthropic — Natural Language Autoencoders
OpenAI published its own internal Codex deployment guide: sandboxing, an Auto-review mode that uses a sub-agent to approve low-risk actions, network policy with allow/deny lists, OS-keychain credential storage forced through ChatGPT Enterprise, prefix rules to gate shell commands, and OpenTelemetry export of prompts/tool-calls/approval decisions piped to a SIEM and an AI security triage agent.[4]OpenAI — Running Codex safely at OpenAI
It's the most concrete enterprise security blueprint any frontier lab has published for coding agents. The principle is "low-risk frictionless, high-risk gated, everything audited." Configs are admin-enforced via macOS managed preferences and local requirements.toml files that users cannot override.
allowed_domains / denied_domains with explicit allowed_web_search_modes = ["cached"] so web fetch hits OpenAI's cache, not the open internet.cli_auth_credentials_store = "keyring", forced_login_method = "chatgpt", pinned to a ChatGPT Enterprise workspace UUID. Activity flows to the Compliance Logs Platform.prefix_rule blocks/allows shell command shapes (gh pr view auto-allowed; risky kubectl mutations require approval).
Codex exports OpenTelemetry logs for user prompts, tool approvals, tool results, MCP server usage, and network proxy decisions. OpenAI's internal triage uses these to answer why Codex did something when an endpoint alert fires — explaining intent, not just behavior. log_user_prompt = true is the default for prod.[4]OpenAI — Running Codex safely at OpenAI
OpenAI shipped a Codex Chrome extension (macOS + Windows) that drives the user's real authenticated browser — same profile, same cookies, same logged-in apps — across multiple parallel tabs.[5]OpenAI — Codex can now use Chrome directly on macOS and Windows Unlike the in-app browser, it creates its own Chrome tab group so it doesn't hijack your workspace, and it leverages code execution to script repetitive work instead of relying on the slow screenshot→reason→click loop.
This pairs naturally with the OpenAI safety blueprint above — the same managed network policy and OpenTelemetry trace covers Chrome actions.
Simon Willison flips the conventional wisdom: with large context windows, Markdown's token-efficiency win matters less than HTML's expressiveness. Asking Claude for an HTML explanation gets you SVG diagrams, interactive widgets, and in-page navigation — not just bold text and bullets.[6]simonwillison.net — The Unreasonable Effectiveness of HTML
Asking Claude for an explanation in HTML means it can drop in SVG diagrams, interactive widgets, in-page navigation and all sorts of other neat ways of making the information more pleasant to navigate.
The post collects examples at Thariq Shihipar's gallery and references the recent Linux kernel exploit write-up at copy.fail as a real-world case where HTML output dramatically improved comprehension over Markdown. It dovetails with Gary Tan's "thin harness, fat skills" argument (topic 9): Markdown is code that an LLM compiles, but HTML lets it compile into an actually interactive artifact for humans.
llm CLI, tools.simonwillison.net.
OpenRouter shipped a fourth tool type — HITL tools — to its Agent SDK. The onToolCalled hook inspects each invocation: return a value and the agent continues; return null and the loop pauses with status: 'awaiting_hitl' for a human decision, then resumes via callModel with a function_call_output item.[7]OpenRouter — Human-in-the-Loop Tools for the Agent SDK
requireApproval) always pause — binary consent for "delete this database" style actions.
The optional onResponseReceived hook lets you stamp metadata, normalize, or enrich the human's response before it reaches the model. If it throws, the error surfaces as { error, originalOutput } rather than getting silently swallowed. This is a cleaner pattern than scattering branching logic across the orchestrator, and it stacks with the Anthropic Managed Agents primitives (topic 8).
@openrouter/agent, Zod schemas, OpenRouter Agent SDK.Nathaniel Whittemore's weekly recap argues the AI narrative just forked in three directions at once: Ezra Klein and A16z's David George publishing back-to-back "no AI jobpocalypse" essays, JPMorgan's Dimon and BlackRock's Fink declaring there is no AI bubble (Fink: "we have supply shortages"), and Elon Musk effectively folding xAI into SpaceX while betting his future on Terafab — a chip plant the NYT now pegs at $55–119B, up from the original $20–25B estimate.[8]AI Daily Brief — The Week the AI Story Shifted
~01:20 Ezra Klein cited Alex Tabarrok's "relational sector" argument — services where the human providing it is part of the value (think tutoring, coaching, hospitality) can't be disrupted the same way. A16z's David George followed with hard data: a chart of US employment by sector since 1850 showing labor diversifying, not collapsing, and a notable stat that mentions of AI in earnings calls reference augmentation over substitution at an 8:1 ratio.
~08:00 The Anthropic–Google 5 GW deal is now reported at $200B over 5 years. Despite the circular-funding optics (Google's $40B Anthropic investment), Google stock added ~12% on the news. Carmen Lee's quote: "A capital bubble is a financing phenomenon. A compute bubble requires every physical bottleneck to clear at once" — GPUs, power, substations, colo, cooling, operators.
~11:00 Wednesday's announcement that Anthropic is taking over Colossus 1's full capacity is paired with xAI ceasing to exist as a separate company (folded entirely into SpaceX). Whittemore reads Elon's pivot as repositioning from model developer to infrastructure provider — and Anthropic's commitment to Terafab gives Elon "an unquenchable source of demand" beyond Tesla/Optimus.
/goal~22:00 Conflicting reports on whether the White House will require model pre-release vetting. Politico's source: "There's one or two people very intent on government regulations, but they're sort of the minority of the bunch."
~23:00
The "play with this weekend" pick is Codex's new /goal command — Philip Corey's "Ralph loop" that keeps Codex working on a single objective across turns until verified complete. A16z's Andrew Chen reported 14 hours of unattended progress on an eGPU+Mac driver project; Alex Finn shipped a full extraction-shooter video game from a single /goal prompt with image generation enabled for assets.
The biggest advancement in AI coding this year has been /goal, and it isn't even close. It allows your AI agent to quite literally work for days without stopping. You give it a mission, it works until the mission is complete. — Alex Finn
~20:00 Cloudflare laid off 1,100 just months after hiring 2,000; Coinbase's transaction revenue was down 40% YoY. Whittemore notes that for the first time, observers are pushing back on the lazy "AI did it" framing instead of accepting it.
Dan Shipper interviews Angela (head of product, Claude platform) and Caitlin (head of engineering, Claude platform) at Code with Claude. The big tell: the team is converging on a future where users specify only an outcome and a budget — Claude picks the model, spawns the sub-agents, and writes its own harness on the fly. Today's harness engineering is "rent in San Francisco" — expensive but the cost of being in the future.[9]Every — The Secrets of Claude's Agent Platform From the Team Who Built It
~02:00 Built on the same Messages API everyone else uses, but bundled with code execution, web search, sandboxes, memory, vaults for credentials, and multi-agent orchestration. Anthropic builds first-party products on the same platform — "internal platform = external platform."
~12:00 Old wisdom: build a generic harness and hot-swap models. New reality: each frontier model is being trained against specific primitives (Claude → file systems + skills; others → reasoning loops). The "hot-swap unit" is shifting from model to harness + model. Internally Anthropic tried multiple memory harnesses for Managed Agents and saw "drastically different" eval results from each.
~23:00 Recurring example: a marketing-copy → legal-review agent. The marketer talks to "Claude" but underneath it's many Claudes with different system prompts and skills. References Stripe's Minions and Ramp's internal agent platforms as the template.
~34:00 Generalizable patterns Anthropic sees customers building: advisor + executor separation, generator + adversary, fan-out/swarm (good for bug hunting), best-of-N. Each architecture suits different problem shapes — and you can hill-climb at the architecture layer, not just the prompt layer.
~37:00 Shipper proposes "having a little funeral" for decommissioned agents — a graveyard page on their site. Caitlin's answer: Anthropic ships skills that help agents auto-upgrade to new models, and the "most AGI-pilled" customers run monitoring agents over their other agents.
~39:00 Angela: a year out, the API surface should compress to outcome + budget. Caitlin (the engineering grounding): "the platform has to seriously scale" because agents will be constantly running and recreating themselves — the bottleneck is whether the token pipe can keep up.
We'd want to experiment with directions where Claude actually gets so good at understanding itself, it figures out what model you should be using, it figures out how to spin up all the sub-agents. We don't have to think so much about harness engineering in that world.
After 13 years away from code, YC president Gary Tan shipped a full blogging platform + agentic newsroom ("Gary's List") in 5 days for ~$200 in Claude Max — versus $4M and 18 months the first time he built Posterous in 2008. The Lightcone interview is a manifesto for "token-maxing": treat tokens like SF rent — expensive but the cost of living in the future.[10]Y Combinator — Tokenmaxxing: How Top Builders Use AI To Do The Work Of 400 Engineers
~01:00 Posterous v1 (2008): $4M, 1.5 years, 7 people. Posterous v2 (Post Haven, 2013): $100k, 3 months, 2 people. Gary's List v3 (2026): $200 in tokens, 5 days, solo — and it includes deep-research-grade article generation, recursive web crawling, and full RAG over an issue archive. After normalizing for logical lines of code, Gary calculates he's writing 400× the volume he did at his 2013 peak — while running YC full-time.
~10:00 GStack started as Gary's Apple Notes of recurring prompts. The breakthrough: telling Claude "before you start work, make an ASCII diagram of all data flows, inputs, outputs, error messages." That single instruction loads the right context into latent space and the agent boils the ocean correctly. The "CEO Plan" skill uses Brian Chesky's 10-star-experience framing ("what's a 6-star, a 7-star...") — meta-prompted from the original plan-review skill.
~21:00 Don't rewrite the harness — Claude Code already exists. Put all your effort into markdown skills. The wedding-planner analogy: markdown is the checklist a human would write to teach the next planner; code is for the deterministic actions (the Twilio call to the venue). The bug pattern Gary keeps seeing: people putting in code what should be in markdown, because code can't handle special cases the way latent space can.
~17:00
Gary discovered the YC community split between Claude Code (ADHD CEO mode) and Codex (200-IQ near-nonverbal CTO mode). He shipped a /codex GStack skill that hands the current plan or repo to Codex from inside Claude Code and pipes the bug list back. The reverse /claude works from within Codex.
~36:00 "It's like SF rent. It seems ridiculous, but it's the cost of being in the future." Spending $500/day on tokens for a serious project is the correct play. Founders who economize on tokens are economizing on the wrong axis. Gary's pitch to skeptics: "We have the same MacBook Pro. There's nothing between us and millions of years of borrowed machine consciousness."
I just want the machine to do the stuff that I don't want to do. I never want to be entirely out of the loop. You can be a time billionaire by borrowing the time from the machines.
Andrew Ng's weekly newsletter covers five stories: ByteDance expanding Seedance 2.0 to CapCut's 736M MAU (filling the void OpenAI left when it killed Sora); Nvidia's RL + LLM stack producing chip designs that beat human engineering benchmarks by 20–30%; a Gallup survey of 23,700 workers showing 50% AI adoption and 65% reporting productivity gains; a UT-Austin/UCLA/NTU/Sony method hitting 81.2% on the LIBERO robot benchmark while avoiding catastrophic forgetting; and Ng's letter arguing against the "AI jobpocalypse."[11]DeepLearning.ai — The Batch Issue 352
Bill Dally's team uses NVCell, PrefixRL, ChipNeMo, and BugNeMo across five design stages — layout to verification. The "unconventional designs that exceed human benchmarks" framing matches the recent Google AlphaEvolve narrative (Verkoran also cited) and reinforces the topic-7 thesis that AI demand is itself AI-driven.
Ng joins Ezra Klein and David George (topic 7) in arguing the AI-jobs-collapse story is overblown. His evidence: software engineering hiring is still strong despite Copilot/Claude Code; macro data contradicts the narrative; he predicts an "AI jobapalooza" of net-new roles. Three independent essays in one week on this thesis is itself a signal.
There will be no AI jobpocalypse. — Andrew Ng
The 81.2% LIBERO result combines RL with parameter-efficient fine-tuning to add new tasks without erasing old ones — directly attacking the "catastrophic forgetting" problem that has dogged robot foundation models. Pairs with Google DeepMind's recent Gemini Robotics-ER 1.6.
A week after copy.fail, "Dirty Frag" dropped — two separate Linux kernel vulnerabilities chained together to defeat AppArmor namespace restrictions. Both exploit the same primitive: splice() + non-linear SKB fragments to bypass copy-on-write and corrupt page cache entries for /etc/passwd via in-place crypto operations. One variant brute-forces an FCrypt 56-bit key to decrypt the cipher text into the exact plaintext the attacker wants. The cadence of these AI-assisted kernel findings is what's new.[12]Low Level — SERIOUSLY? AGAIN?
~04:00
Variant 1 (ESP): IPsec ESP input check skips copy-on-write when the SKB is non-linear. Attacker splices their socket data with a page pointing at /usr/bin/sudo, forces in-place decryption, and writes 4 bytes into the sudo page cache — same shape as copy.fail.
~07:00
Variant 2 (RXRPC): Used when AppArmor blocks user namespaces. FCrypt 56-bit block cipher decrypts in place — same buffer for cipher and plaintext. Attackers brute-force keys until decryption blanks the password X field in /etc/passwd, leaving root with no password. The kernel comment literally said "we probably want to decrypt this directly in a target buffer and not do it in place."
People that are already like kernel exploit primitive masters can now just search at such a higher velocity that it's honestly insane. This is the new paradigm with AI-enabled vulnerability research.
~11:00 The deeper read: this is the offensive side of the Mythos coin (topic 1). Defenders and attackers are both getting machine-scale code-reading capabilities at the same time. The defender's only structural advantage is being able to deploy these tools internally before public disclosure — which is exactly Anthropic's Mozilla deployment model.
Károly Zsolnai-Fehér breaks down OpenAI's GPT-5.5 Instant system card. Headlines: hallucination rates on medical/legal queries roughly halved; the instant model now approaches frontier thinking models on several benchmarks, including a new TroubleshootingBench where it scores just under PhD-expert level (~36%) on real biology protocol errors. The catch: refusal rate on hard adversarial multi-turn prompts is roughly cut in half at the model level — patched at the system level with classifier bouncers.[13]Two Minute Papers — OpenAI's ChatGPT 5.5 Instant
~03:00 A nice methodological aside: OpenAI now penalizes verbose answers with a "length tax" to prevent previous gaming of HealthBench. GPT-5.5 still beat 5.3 even after paying the length tax — so the gain is real, not verbosity-juiced.
~05:00 The biology-refusal regression on hard adversarial prompts is the worrying part. OpenAI shipped it anyway, behind two classifier "bouncers" that screen prompts before the model sees them and screen outputs before the user sees them. The combined system blocks effectively — but Zsolnai-Fehér flags that fixing safety at the harness/classifier level rather than the model level lets issues run deeper into the pipeline.
Imagine a car that is unsafe on a track. They would not fix the car itself, but put stronger guardrails around the track. Does it solve the problem? Kind of. But you let issues run deeper into the pipeline.
This is the inverse of Anthropic's "teaching Claude why" approach (topic 2) — instructive contrast.
Tech Brew runs two AI-workplace stories same day. First: emotion-AI surveillance is hitting US workplaces — MorphCast in Zoom, Slack's Aware monitoring sentiment, MetLife scoring call-center vocal energy, Burger King's OpenAI-powered "Patty" headset rating drive-thru friendliness, Meta tracking mouse and keystrokes. Market projected at $9B by 2030 (3×). The EU banned it for workplace use; the US has no equivalent.[14]Tech Brew — Workplace Surveillance Gets an Emotional Upgrade
Researchers note Americans scowl when angry only about a third of the time, and frowning often signals concentration. The technology is being deployed at scale despite a thin evidence base — and US workers in most states aren't even legally entitled to know they're being monitored.
WSJ's Joanna Stern's book I Am Not A Robot (out May 12) documents replacing Google search with ChatGPT, Perplexity, Gemini, and Claude for a full year.[15]Tech Brew — Joanna Stern's Great Gen AI Experiment The habit stuck — multimodal context dumps proved the killer feature — but the journalistic concern about reduced primary-source contact is real.
They don't actually know the truth; they just predict what words are likely to come next. — Joanna Stern
oMLX is a new inference runtime built on Apple's MLX framework that handles the KV cache differently: hot context lives in unified memory, older system prompts and tool definitions get paged to SSD via persistent caching. In a Codex coding task on an M2 MacBook with Qwen 3.6 35B, it generated at 47 tok/s vs. LM Studio's 16 tok/s, finished in 20 minutes vs. 35, and hit 89% cache efficiency across a 1.78M-token run.[16]Better Stack — oMLX local runner
~04:00 The reviewer used Codex instead of Claude Code specifically because Claude Code's system prompt eats ~16.2k tokens out of a 32k context — leaving only 16k for the actual project. Codex's leaner system prompt is essential for local-model workflows where every token counts.
~05:00
The killer feature: when context overflows, the actual computational state is on SSD, so a /clear in Codex doesn't erase the project — oMLX hydrates the model from disk on the next prompt and Codex picks up where it left off. The trade-off is occasional 400 errors at the context ceiling vs. LM Studio's rock-solid stability but RAM-saturating behavior.
Two rapid-fire devtool firehoses landed May 8. GitHub Trending #33 features 35 projects worth knowing about — including Vercel DeepSeek (multi-stage AI security harness orchestrating Claude Opus 4.7 + GPT-5.5 across 1,000+ parallel sandboxes), free-claude-code (a local proxy intercepting Claude Code traffic to free models), Atlas (Rust+CUDA Blackwell engine hitting 100+ tok/s on Qwen 3.6 35B, 3× VLLM), Goal Buddy (a structured /goal-style scout-judge-worker loop), Photo Agents (vision-grounded memory with autonomous skill writing), and the gloriously cursed Cursed Browser (no rendering engine — sends raw HTML to a VLM and asks it to hallucinate what the page should look like).[17]Github Awesome — GitHub Trending Weekly #33
marimo's team published a 23-minute walkthrough of multi-cursor edits (Cmd-D), cell navigation, and AI-shortcut chords for their reactive Python notebook.[21]marimo — The Best Keyboard Shortcuts for marimo Practical for anyone running marimo notebooks day-to-day.
~08:00
Codex-Server: an MCP "cost-aware router" pattern — your frontier model is the tech lead reviewing; a cheaper model is the junior dev generating patches. Same shape as Gary Tan's /codex + /claude swap (topic 9) but at the MCP layer.