Mythos might be a head fake

AI ModelsHot Take

Mythos leaked, but Opus is probably the real launch

Nate Herk argues the Claude Mythos hype has a real basis: a model identifier briefly appeared, Project Glasswing access widened, and prediction markets started pricing a public launch. His base case is still quieter: Mythos capabilities flow into a future Opus rather than a public Mythos button.^{[1]Is Claude Mythos Coming?} Last Week in AI adds the model-card angle: Opus 4.8 improved coding scores and shipped alongside dynamic workflows, while Mythos remains the higher-risk system Anthropic is holding back for safety work.^{[2]Last Week in AI #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3}

~00:00 The leak and the safety wall

Herk frames Mythos as a tier above Opus with unusually strong cybersecurity capability. The public signal was a brief API identifier appearance, but he keeps coming back to Anthropic's earlier position that the preview was not planned for general release. His plausible path is a limited or renamed capability drop, especially if OpenAI ships a new model first.

~05:15 Opus 4.8 is the public frontier

Last Week in AI describes Opus 4.8 as a real but incremental improvement over 4.7, with stronger agentic coding numbers and a more verbose, argumentative personality. The more strategically important piece is dynamic workflows: a model writing a graph of sub-agent work for tasks that can run hours or days.^{[2]Last Week in AI #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3}

~16:20 Frontier labs are becoming systems companies

The podcast's point is that model intelligence alone is no longer the whole frontier. Orchestration, workflow generation, and long-running task infrastructure become part of the product, and they also create the usage data labs need to keep improving agent behavior.

Tools: Claude Mythos, Claude Opus 4.8, Project Glasswing, dynamic workflows

ProductivityDeveloper Tools

Nate B JonesNate B JonesLatent Space

Taste files are turning rejection into infrastructure

Nate B Jones says the valuable artifact in AI work is often the rejection: the moment you say no, encode a constraint, or correct taste. CommandCode's Taste system turns that idea into repo-local memory, learning repeated preferences from edits and merges so agents stop relearning the same micro-decisions.^{[3]The hidden value in your AI's worst outputs}^{[4]The most expensive AI mistake you are making}^{[5]Making DeepSeek v4 outperform Opus 4.7 with Taste}

~00:00 Rejections are knowledge events

Jones argues that teams are generating useful negative examples constantly, then letting them disappear into chats, Slack threads, and comments. The capture has to happen where the work happens; otherwise nobody will maintain a separate spreadsheet of taste.

~00:00 Scaling your no

The companion short is blunt: as generated output grows, taste cannot stay implicit. Every no should be treated as a reusable rule, constraint, or preference that compounds across future work.

~03:05 CommandCode's Taste layer

Ahmad Awais describes Taste as a compact, repository-local memory system that records repeated project preferences: package manager choices, CLI conventions, testing defaults, release habits, and other small decisions that developers rarely write down. The goal is not a giant rules file; it is a reviewed, evolving set of taste notes that can steer any coding agent.^{[5]Making DeepSeek v4 outperform Opus 4.7 with Taste}

Tools: CommandCode, Taste, skills, Claude, DeepSeek

AI ModelsDeveloper Tools

Latent SpaceLast Week in AI

Open models got better when the harness stopped scolding them

Latent Space's CommandCode interview explains why some open models feel worse in coding agents than their raw intelligence suggests: repeated schema and tool-call errors can trap them in loops. CommandCode repairs common mistakes deterministically, returns the result plus a hint, and reports that DeepSeek, Kimi, and MiniMax-class models become much more usable when the harness helps instead of only rejecting.^{[5]Making DeepSeek v4 outperform Opus 4.7 with Taste} Last Week in AI's Minimax-M3 discussion points in the same direction: open models are competing on cost, speed, context length, and deployability, not just benchmark rank.^{[2]Last Week in AI #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3}

~07:05 Tool confusion

Awais says DeepSeek V4 Pro would sometimes repeat the same malformed tool call dozens of times after receiving a strict schema error. The failure was not only model capability; it was an agent harness that gave an unhelpful correction and then waited.

~10:08 Repair, then teach

CommandCode's repair approach is deterministic: coerce a nullable or wrong-shaped argument when the intent is obvious, execute the tool, and return a repair hint explaining what should have been sent. The reported behavior is that later tool calls improve because the model got both the result and a pattern correction.^{[5]Making DeepSeek v4 outperform Opus 4.7 with Taste}

~57:58 The open-model lane

Last Week in AI covers MiniMax-M3 as part of the same competitive lane: million-token context, sparse-attention speedups, lower serving cost, and eventual open weights. The takeaway is that harness quality can make cheaper models feel much closer to frontier systems for real coding work.

Tools: DeepSeek, Kimi, MiniMax-M3, CommandCode, Zod, tool repair

AI ToolsDeveloper Tools

AICodeKingGithub AwesomeOpenAIAI Engineer

Agent UIs are moving from terminal logs to surfaces

Hermes 0.16 adds a native desktop app, remote gateway profiles, a fuller dashboard, model search, undo, and a trimmed skill set. GitHub's MCP Apps talk makes the same bet inside VS Code: tool results should render as sandboxed interactive UI, not just text. The result is a broader shift from hidden agent internals toward inspectable surfaces for models, sessions, skills, tools, and enterprise workflows.^{[6]Hermes Agent 5.0 (New Upgrades): HERMES BECAME ULTRA-HERMES!}^{[7]GitHub Trending Weekly #35}^{[8]OpenAI Investor Innovation Day}^{[9]Building Interactive UIs in VS Code with MCP Apps}

~01:05 Hermes gets visible

AICodeKing calls Hermes 0.16 the Surface release because the agent platform now exposes its internals through a desktop app, web dashboard, remote gateways, profile switching, fuzzy model search, and a cleaner default skill system. The important usability change is that sessions, credentials, MCP catalog settings, webhooks, and messaging channels are no longer buried in config files.

~04:17 MCP Apps

GitHub's AI Engineer talk describes MCP Apps as tool results that include a UI resource reference. VS Code fetches and renders that HTML inside a sandboxed iframe, letting an Excalidraw diagram, flame graph, checkout, or data explorer stay interactive inside chat instead of flattening into text.^{[9]Building Interactive UIs in VS Code with MCP Apps}

~00:00 The open-source grab bag

Github Awesome's weekly list is packed with the same surface-area theme: Butterbase exposes 43 backend tools to coding agents, Memory OS and Persistia add longer-lived context, Harness Terminal and Vigil add control planes, and Rift gives each agent an isolated workspace.^{[7]GitHub Trending Weekly #35}

~00:02 Enterprise adoption is already operational

OpenAI's investor clip is light on technical detail, but the enterprise examples are operational: deal folders, Excel workflows, hundreds of internal GPTs, and 2,700 employees with enterprise licenses. The surface question matters because this is no longer a side project UX problem.

Tools: Hermes, MCP Apps, VS Code, GitHub Copilot, Excalidraw, Butterbase, Memory OS, Vigil, Rift

Developer ToolsAI Tools

AI EngineerAI Engineer

Evals and payments both need determinism

Ara Khan's AI Engineer talk says evals are neither scoreboard truth nor useless vibes: good agent evals require realistic tasks, isolated environments, failure attribution, and careful hill-climbing without overfitting. Stripe's autonomous-payments talk lands on a parallel principle: discovery can be probabilistic, but credentials, checkout, and spending limits need deterministic APIs and enforceable constraints.^{[10]Evals Are Broken, Use Them Anyway}^{[11]Building safe Payment Infrastructure for the autonomous economy}

~03:17 Evals as engineering and philosophy

Khan splits the bad eval discourse into two camps: people who over-trust benchmark dashboards and people who retreat entirely to taste. His recommendation is to use current, precise evals, let new model releases settle before switching, and build use-case-specific eval suites when public ones do not match the work.

~10:21 Agent evals need real environments

For coding agents, a task can involve reading files, installing dependencies, debugging infrastructure, running tests, and making tradeoffs. Khan points to Terminal Bench-style tasks and infrastructure that fans isolated jobs out in parallel, then asks agents to classify failures from traces so the harness can improve.^{[10]Evals Are Broken, Use Them Anyway}

~00:14 Deterministic money rails

Stripe's Steve Kaliski uses the same boundary for payments: let LLMs explore, search, and recommend, but do not let nondeterminism handle credentials, charge amounts, seller identity, or checkout state. Shared payment tokens scope credentials to seller, time, amount, and currency; 402-style machine payments describe paid API resources; and the Agentic Commerce Protocol gives agents structured checkout state instead of a fragile browser scrape.^{[11]Building safe Payment Infrastructure for the autonomous economy}

Tools: Terminal Bench, Harbor, Stripe shared payment tokens, Machine Payments Protocol, Agentic Commerce Protocol

Developer ToolsIndustryHot Take

Theo - t3.gg

Cloudflare bought the Vite-shaped future of deployment

Theo's Cloudflare/Void Zero video argues the acquisition is not just about owning Vite, Vitest, RollDown, OXC, and the people around them. It is about making deployment legible to agents: code should describe infrastructure, a CLI should provision the cloud, and the gap between working locally and running publicly should shrink from hours to one command.^{[12]Cloudflare bought Vite to destroy Vercel}

~03:03 Void's deployment thesis

Theo describes Void as an attempt to make Vite apps deploy directly to real infrastructure. The developer writes code against platform primitives like database, KV, storage, queues, and AI; the platform reads that code and provisions the matching production resources.

~18:08 Agents change the ROI

His sharper claim is that agents make this kind of integrated cloud worth building. A human can tab into a dashboard and click through setup; an agent is much better at editing code than navigating bad dashboards. That makes code-first cloud primitives more valuable in an agent-built software world.

~26:14 Cloudflare's strategic gap

Cloudflare already has strong CDN, compute, and database primitives, but its developer experience has historically required platform-specific configuration and Wrangler knowledge. Void Zero gives it a credible path leftward into framework, bundler, and app-deployment experience.^{[12]Cloudflare bought Vite to destroy Vercel}

Tools: Vite, Vitest, RollDown, OXC, Vite Plus, Cloudflare Workers, Wrangler, Lakebed

Developer ToolsAI Tools

Simon WillisonSimon WillisonLearnThatStackArjay McCandless

The safest code runner today might be tiny Python in WASM

Simon Willison released micropython-wasm, an alpha library for running MicroPython inside WebAssembly from Python, aimed at sandboxed plugin and agent execution. LearnThatStack's rate-limit guide and Arjay McCandless's AWS bill short supply the operational footnotes: when agents and APIs run code for users, CPU, memory, concurrency, retries, caching, logging, and idle resources all become product concerns.^{[13]Running Python code in a sandbox with MicroPython and WASM}^{[14]micropython-wasm 0.1a2}^{[15]How API Rate Limiting Actually Works and How to Build Your Own}^{[16]How to reduce AWS cloud bills}

Sandbox requirements

Willison wants a Python code sandbox that installs cleanly from PyPI, enforces memory and CPU limits, controls file and network access, and exposes selected host functions. WebAssembly via wasmtime gives him a promising isolation layer, while MicroPython supplies a small interpreter that can be embedded as a WASM blob.^{[13]Running Python code in a sandbox with MicroPython and WASM}

Persistent sessions and host functions

The prototype maintains interpreter state by running a loop inside MicroPython that blocks on host-provided code, evaluates it, and reports results. Version 0.1a2 adds a CLI so users can try simple code execution with uvx micropython-wasm.^{[14]micropython-wasm 0.1a2}

~00:00 Rate limits as API contracts

LearnThatStack covers the other side of safe execution: respect Retry-After, use exponential backoff with jitter, cap concurrency before you hit limits, cache stable data, and implement server-side limiters through Redis, Nginx, or gateways depending on whether you need per-user policy.^{[15]How API Rate Limiting Actually Works and How to Build Your Own}

~00:00 Cloud bills as feedback

Arjay's cost advice is simple but relevant: downsize underused compute, cache and index expensive database paths, control logging volume, and delete idle resources. Agentic systems will make these boring practices more important, not less.

Tools: MicroPython, WebAssembly, wasmtime, Datasette Agent, Redis, Nginx, AWS CloudWatch

IndustryAI Tools

Y Combinator

Emergent says app builders have crossed into real revenue

Y Combinator's Emergent interview is the strongest business story of the day: Mukund Jha says the nine-month-old platform has 8.5 million users, more than 10 million apps built, and over $100M in annualized revenue by helping non-programmers ship working software rather than demos.^{[17]Emergent: How Six Months of Tinkering Led To A $100M ARR Company}

~01:00 Real apps, not prototypes

Jha says Emergent started as a coding-agent research lab, topped a coding benchmark with a four-person team, then turned that foundation into a consumer platform for building, hosting, deploying, and maintaining software through chat.

~18:08 Technical foundation

The system is described as a multi-agent orchestrated platform with design, testing, memory, and infrastructure components. The team built container and snapshotting technology so multiple parallel agents can work from the same state, and Jha says the architecture has already been rewritten three times as new model classes changed what was possible.^{[17]Emergent: How Six Months of Tinkering Led To A $100M ARR Company}

~27:13 The startup lesson

The interview's advice is to think globally from day one and aim at harder problems, because AI gives small teams access to the same frontier primitives and makes ambition cheaper to test.

Tools: Emergent, coding agents, parallel agents, app builders

IndustryHot Take

Last Week in AILast Week in AI

Tech teams lose senior leverage before they lose headcount

Last Week in AI's short on evaporative cooling argues that the most mobile people leave first, which makes AI labs and infrastructure teams weaker in ways headcount charts hide. The full podcast ties that labor-market point to model races, Anthropic's IPO positioning, Minimax-M3, cyber risk, YouTube AI labels, and new research on model capacity and self-recognition.^{[18]The Evaporative Cooling Problem in Tech Teams}^{[2]Last Week in AI #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3}

~00:00 Evaporative cooling

The short argues that when senior people or co-founders leave a company, the loss is not proportional to headcount. In AI work, tight coupling between RL infrastructure, evals, post-training, data systems, and product loops means senior coordination talent is especially hard to replace.

~57:58 Open models and business pressure

The episode's open-source lane covers MiniMax-M3 as another sign that open or open-weight models are attacking frontier labs through price, speed, long context, and deployability. The business lane discusses high revenue multiples and the pressure for AI application startups to entrench before labs move downmarket.^{[2]Last Week in AI #247 - Opus 4.8, MAI, Anthropic IPO, Minimax-M3}

~92:10 Labels, capacity, and self-recognition

Later sections cover YouTube automatically labeling photorealistic AI videos, research on why larger models retain rarer tasks, and post-trained models recognizing their own generations. The recurring thread is that AI systems are becoming social, economic, and organizational infrastructure, not just model releases.

Tools: MiniMax-M3, YouTube AI labels, model evals

Developer ToolsProductivity

Github AwesomeReal PythonTwo Minute Papers

The day's practical developer lane was all edges

Github Awesome's weekly roundup covered the long tail of agent-era tools: autonomous dataset builders, open TTS, backend-as-a-service with MCP tools, memory layers, sandboxes, isolated workspaces, local agent safety, and terminal testing. Real Python pointed to free US datasets for analysis practice, while Two Minute Papers floated game-master-style agents as a new gameplay primitive.^{[7]GitHub Trending Weekly #35}^{[19]Free US Datasets for Python}^{[20]AI Agents as Games Masters?}

~00:00 Repos worth scanning

The strongest developer-tool entries were BigSet for generated datasets, MisoTTS for open-weight emotional speech, Butterbase for backend provisioning through agent tools, Memory OS and Persistia for long-term context, Sandboxes and Rift for isolated workspaces, Vigil and Harness Terminal for policy controls, and Nullsec S1 for security review.

~00:00 Free datasets

Real Python's quick item highlights a PyPI package of US datasets covering politics, crime, wages, income, elections, mortality, presidents, education, sports, stocks, and entertainment. It is not flashy, but it is useful raw material for notebooks, tutorials, and model-eval examples.

~00:00 Agents as game masters

Two Minute Papers' short gestures at AI agents embedded in games as story drivers or player assistants. It is early, but the design question is obvious: when does an agent become a dynamic game master instead of a scripted NPC?

Tools: BigSet, MisoTTS, Butterbase, Memory OS, Rift, Vigil, Nullsec S1, US datasets

Industry

Morning BrewMorning Brew

Macro data and GLP-1s made the non-AI business lane weird

Morning Brew's business lane had two useful non-AI reads: US employers added 172,000 jobs in May, far above expectations, while GLP-1-driven weight loss is creating new apparel return costs as shoppers size down and reorder wardrobes.^{[21]Hiring surged well beyond all expectations in May}^{[22]GLP-1-related returns leave retailers reeling}

Jobs beat expectations

The May jobs report added 172,000 jobs versus roughly 80,000 expected, with upward revisions to March and April lifting the three-month average to 188,000. Morning Brew notes the strength came with warning signs: unemployment held at 4.3%, long-term unemployment rose, and wages lagged inflation.^{[21]Hiring surged well beyond all expectations in May}

Weight loss as retail friction

GLP-1 adoption is showing up in apparel returns. Morning Brew cites rising exchanges from shoppers sizing down, doubled merchandise-return value from 2020 to 2025, lower plus-size sales at Macy's, smaller sizes at Victoria's Secret, and later wedding-dress purchases.^{[22]GLP-1-related returns leave retailers reeling}

PodcastIndustry

Dwarkesh PatelAcquired

The human stories were about wandering, founders, and power

Dwarkesh Patel's Adam Brown clip is a story about hitchhiking as radical listening: a short ride becomes a 1,000-mile detour and a visit to a father's grave. Acquired's Vanguard clip is the opposite energy: institutional power forcing Jack Bogle out even after he had become the public face of the index-investing movement.^{[23]A 10-mile ride turned into a 1,000-mile spiritual quest - Adam Brown}^{[24]Vanguard forced out Saint Jack Bogle in 1999}

~00:00 The 1,000-mile detour

Brown describes truckers opening up after decades alone in the cab. One planned 10-mile ride turned into a 1,000-mile spiritual quest, including a stop at the driver's father's grave in Baton Rouge. The story is only loosely tech-adjacent, but it is a useful reminder that trust can appear in very high-bandwidth, low-formality contexts.

~00:00 Bogle's second firing

Acquired's clip revisits Vanguard's 1999 age-limit move that forced Jack Bogle off the board despite his saint-like status with investors. The detail that the rule was not applied evenly makes the move read less like governance hygiene and more like a power struggle.^{[24]Vanguard forced out Saint Jack Bogle in 1999}