Opus 4.7 ships; Codex eats the desktop

AI Models Developer Tools

Claude Opus 4.7 lands: $5/$25 pricing, 3x vision, xhigh effort

Anthropic released claude-opus-4-7 at the same $5/M input, $25/M output pricing as 4.6, but with a new tokenizer that encodes the same prompt into up to 1.35x more tokens ^{[1]Anthropic — Introducing Claude Opus 4.7}. Vision leaps: it accepts images up to 2,576px on the long edge (~3.75 megapixels, 3x prior Claude), scores 98.5% on XBOW visual acuity vs 54.5% for 4.6, and Rakuten reports it resolves 3x more production SWE tasks ^{[1]Anthropic — Introducing Claude Opus 4.7}. A new xhigh effort level sits between high and max, and Claude Code's default effort was silently upgraded to xhigh, meaning identical workflows now burn roughly as many tokens as Opus 4.6's old max ^{[3]Better Stack — Opus 4.7 Is GREAT (except the token usage)}.

Benchmarks vs 4.6

CursorBench: 70% vs 58%^{[1]Anthropic — Introducing Claude Opus 4.7}
XBOW visual acuity: 98.5% vs 54.5%^{[1]Anthropic — Introducing Claude Opus 4.7}
Rakuten production SWE: 3x more tasks resolved^{[1]Anthropic — Introducing Claude Opus 4.7}
CodeRabbit recall: >10% improvement^{[1]Anthropic — Introducing Claude Opus 4.7}
Harvey BigLaw Bench: 90.9% at high effort^{[1]Anthropic — Introducing Claude Opus 4.7}
SWE-bench Pro: +10 points over Opus 4.6; SWE-bench Verified: +7^{[2]Developers Digest — Claude Opus 4.7 in 5 Minutes}

What's new in Claude Code

A new /ultrareview slash command triggers dedicated review sessions — Pro and Max users each get three free ultrareviews ^{[1]Anthropic — Introducing Claude Opus 4.7}. The default effort level jumps to xhigh across all plans, and auto mode is extended to Max subscribers as a safer alternative to bypass-permissions ^{[2]Developers Digest — Claude Opus 4.7 in 5 Minutes}. On the API side, task budgets are now in public beta and file-system-based memory is improved for long-running multi-session agents ^{[1]Anthropic — Introducing Claude Opus 4.7}.

Better Stack's hands-on review

Better Stack tested 4.7 head-to-head with 4.6, GPT-5.4, and Gemini 3.1 on a personal-finance dashboard (~00:00). Opus 4.7 won with a clean dark-mode React/TypeScript/Vite app and a working in-memory Express backend — but skipped a persistent DB (Opus 4.6 had used SQLite) and oddly picked React 18 + React Router 6 despite a newer cutoff^{[3]Better Stack — Opus 4.7 Review}.

The cost warning (~02:30): the new tokenizer can use up to 35% more tokens on the same input, and 4.7 thinks more at higher effort levels. The presenter notes 4.7's high actually outscores 4.6's max while using fewer tokens — so dropping from xhigh to high in Claude Code settings may preserve quality while cutting costs.

The same input prompt could now use up to 35% more tokens, and it also thinks more. So that's even more tokens to burn.

Caveats

Long-context regression: needle-in-a-haystack appears to nose-dive vs 4.6^{[3]Better Stack — Opus 4.7 Review}.
Cybersecurity score dipped slightly, attributed to new safeguards being piloted ahead of the restricted Mythos model class^{[3]Better Stack — Opus 4.7 Review}.
Instruction-following is more literal, so legacy prompts may need tweaking^{[3]Better Stack — Opus 4.7 Review}.

Tools: Claude Code, Claude API, Claude desktop app, Opus 4.6/4.7, GPT-5.4, Gemini 3.1, React, Vite, Express, Augment Code, SWE-bench Pro, XBOW, CursorBench, GDPval-AA.

AI Tools Hot Take

Theo - t3.gg The AI Daily Brief

Claude Code desktop app: the Cursor killer (or is it?)

Anthropic bundled Claude Code into a redesigned desktop app next to Claude chat and Claude Co-work — positioned as an "agent orchestration command center" with multi-session sidebar, integrated terminal, and drag-and-drop workspaces ^{[5]The AI Daily Brief — Vibe Coding Gets an Upgrade}. Theo (t3.gg) spent an hour with it and found 40+ bugs — broken copy-paste, hotkeys targeting the wrong window, voice input bleeding into every textbox — and built his own open-source competitor T3 Code with Julius in one-fifth the time ^{[4]Theo - t3.gg — Claude's new Cursor killer just dropped}. Alongside the app, Anthropic introduced Claude Code routines: event-triggered agents (GitHub events, API calls) running on Anthropic-managed cloud infrastructure — "dynamic cron jobs" ^{[5]The AI Daily Brief — Vibe Coding Gets an Upgrade}.

What shipped (the AI Daily Brief view)

Per AI Daily Brief (~02:01), the app is explicitly built for how agentic coding feels now: many sessions in flight, humans in the orchestrator seat. Features include a sidebar of active/recent sessions filterable by status/project/environment, an integrated terminal and file editor, and drag-and-drop workspace layout. Cursor 3 shipped an identical UI earlier this month and Codex is signaling the same — the three stacks are converging.

Cursor, Codex, and Claude Code desktop app look exactly the same now.

Also introduced (~05:03): Claude Code routines. A routine packages a prompt, repos, and MCP connectors, then runs on Anthropic's cloud infrastructure — keeping the agent working when your laptop is closed. Anthropic uses them internally for docs maintenance and backlog grooming. Enterprise vibe-coding hardening is the companion thread: Lovable added native payments ("you're not vibe coding PCI level one"), Superblocks 2.0 pitches vibe coding as an "enterprise attack vector," and Microsoft is testing Open Claw inside Copilot (~07:03).

Theo's roast

Theo opens with begrudging approval — the new app uses less memory than the CLI ("a trash piece of software") and the Claude virtual machine service burns ~2.5 GB RAM but doesn't wreck the system ^{[4]Theo - t3.gg — Claude's new Cursor killer}. Then he hits an avalanche of UX failures inside minutes (~04:00):

No copy buttons anywhere; copied text has bad newlines/word-wrap
Pasted screenshots attach to the previous message ("they don't even know how to use their SDK")
Bypass-permissions mode doesn't persist; resizing the window breaks layout
Hotkeys (including command-1/2/3) always target the first window regardless of focus
Voice button's Stop button stops the entire thread instead of the transcription
Work trees are stored inside the project folder by default, forcing .gitignore edits

AI is really good at building the happy path… but as soon as you hit an edge case… models do not find edge cases. Users find edge cases.

T3 Code and the Codex contrast

Theo and Julius shipped T3 Code, a free open-source desktop app, in roughly 1/5 the time (~06:40). Features he shows off: favicon auto-grab, multi-tab with scroll built in two prompts, diff view via diffs.com, snapshot-based before/after, properly bound hotkeys, and handling of multi-million-token threads (Julius scrolled 10 minutes to reach the top of one). The Codex CLI is Apache-2.0 and ships an app server any harness can plug into; only the Codex desktop is closed, which annoyed Theo enough to build T3 Code in the first place.

If the Codex app was open source like the CLI is, I probably wouldn't have made T3 Code.

He closes on strategic intent (~23:40): lab-built coding apps exist to lock users in and showcase models. On both fronts, Theo says Claude Code fails. He calls out Claude still demanding CLAUDE.md / .claude/ while Cursor and friends adopt the AGENTS.md / agents/ standard, and notes Anthropic's terms of service are "shitty" specifically to block switching.

The level of slop that is being shipped by Anthropic is unfathomable and y'all just grin and bear it.

Tools: Claude Code desktop app, Claude Code CLI, Claude Co-work, Claude routines, Claude skills/connectors, T3 Code, Codex CLI/app, Cursor 3, Lovable, Superblocks 2.0, Microsoft Copilot + Open Claw, AGENTS.md, Google AI Studio, Stitch, Chrome Skills.

AI Tools Developer Tools

OpenAI News OpenAI (YouTube)

OpenAI Codex eats the desktop: background computer use, 90+ plugins

OpenAI radically expanded Codex from a coding assistant into a general-purpose autonomous agent. A new "background computer use" mode lets Codex drive its own cursor across any Mac app in parallel with the user, spin up multiple simultaneous agents, and schedule future tasks via "Heartbeat Automations" ^{[6]OpenAI — Codex for (almost) everything}. Codex now supports 90+ plugins (Atlassian Rovo, CircleCI, GitLab Issues, Microsoft Suite, Neon by Databricks, Superpowers), ships image generation via gpt-image-1.5, and has cross-session memory ^{[7]OpenAI — Codex for (almost) everything (YouTube)}. Usage is 6x since January, now over 2M weekly users ^{[7]OpenAI — Codex for (almost) everything (YouTube)}.

What "background computer use" actually means

Codex now operates a dedicated desktop cursor that opens apps, clicks, types, and runs tasks without blocking or interfering with the user's current work ^{[6]OpenAI — Codex for (almost) everything}. An in-app browser lets users annotate webpages to give Codex context and instructions, and the browser is initially scoped to frontend/game dev tasks^{[7]OpenAI — Codex for (almost) everything (YouTube)}. Heartbeat Automations let Codex schedule future tasks and resume long-horizon work — the same "dynamic cron" pattern Anthropic just shipped with routines.

Model, plugins, pricing

GPT-5.3-Codex: 25% faster than its predecessor, with real-time mid-task steering^{[6]OpenAI — Codex for (almost) everything}. SWE-bench Verified ~80% (vs Claude Code Opus 4.6 ~80.9% — Codex is now inside striking distance).
90+ plugins including Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon (Databricks), Remotion, Render, Superpowers^{[7]OpenAI — Codex for (almost) everything (YouTube)}.
Image generation via gpt-image-1.5 for mockups, product concepts, in-game visuals.
Memory preview: recalls past sessions and user preferences.
Official Codex plugin that runs inside Claude Code.
Pricing: ChatGPT Business/Enterprise workspaces can buy Codex-only seats at token-based pay-as-you-go with no rate limits; Business annual drops to $20/seat. Open-source maintainers get 6 months ChatGPT Pro + Codex + API credits^{[7]OpenAI — Codex for (almost) everything (YouTube)}.

Where it fits

Cited enterprise pilots: Notion and Ramp for engineering automation ^{[7]OpenAI — Codex for (almost) everything (YouTube)}. OpenAI's positioning is that Codex is now an agentic operating layer — not a coding chat — with the same "orchestration command center" framing as Anthropic's redesigned Claude Code.

Codex goes from a code-editor-centric tool to covering full-day agentic workflows similar in scope to Anthropic's computer-use offerings.

Tools: Codex (macOS desktop, Web, API), GPT-5.3-Codex, gpt-image-1.5, 90+ plugins (Atlassian Rovo, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, Superpowers), Heartbeat Automations, in-app browser, cross-session memory.

Hot Take

Nate Herk | AI Automation

Nate Herk's hot take: Anthropic secretly throttled Opus 4.6

Nate Herk argues Opus 4.7 isn't as big a leap as it looks — Anthropic silently dropped 4.6's effort default to "medium" and disabled extended thinking on February 9 without announcing it, then released 4.7 as a "fix" ^{[8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?}. He cites an AMD senior director's analysis of ~7,000 Claude Code sessions finding thinking depth collapsed 73% and the model skipped reading files before editing them 33.7% of the time, with $200/month plans burned through in hours ^{[8]Nate Herk — Claude Opus 4.7 Just Dropped... Or Did It Really?}.

The throttling claim

At ~00:00, Nate lays out the conspiracy: users reported hallucinated git commit hashes, fake package names, and premature task abandonment starting in February. The AMD engineer's 7,000-session audit is the core evidence — thinking depth down 73%, file-read-before-edit compliance down to 33.7%. Nate's framing: 4.7's gains partly restore deliberately degraded 4.6 behavior.

What's genuinely new in 4.7

At ~04:02, Nate concedes the structural wins: a new X-high effort tier, vision gains from architecture changes, a more expensive tokenizer (1–1.3x), SWE-bench Pro jumps, 2x+ biomolecular reasoning, and a new /ultra-review slash command. Anthropic also published a 232-page system card for 4.7. In hands-on tests Nate found 4.7 better at financial chart analysis and caught its own math errors in a SaaS model — but rated the 4.6 interactive deliverable more polished.

Day-one app bugs

At ~10:01, Nate piles on the desktop app critique: Theo found 40+ bugs within one hour — broken buttons, voice input bleeding into every textbox, layout issues. He asks how "one of the world's largest AI companies" shipped trivial one-prompt fixes as day-one bugs.

Tools: Claude Code, Claude Code Desktop App, Claude (web), VS Code.

Hot Take AI Models

Simon Willison's Weblog

Simon Willison: local Qwen3.6 beat Opus 4.7 on the pelican benchmark

Simon Willison ran Qwen3.6-35B-A3B locally on a MacBook Pro M5 using a 20.9 GB quantized GGUF (Unsloth's Q4_K_S) through LM Studio — and found it outperformed Claude Opus 4.7 on the SVG pelican benchmark^{[9]Simon Willison — Qwen3.6 on my laptop drew a better pelican than Opus 4.7}. On the pelican-on-bicycle test, Qwen produced correct bicycle geometry, clouds, and a detailed pouch; Opus 4.7 — even with thinking_level: max — generated a frame with entirely wrong shape. Simon notes he doubts the quantized local model beats Opus on general capability — but for SVG illustration tasks specifically, the local model currently wins.

Setup

Model: Qwen3.6-35B-A3B-UD-Q4_K_S via Unsloth (20.9 GB GGUF)
Host: MacBook Pro M5 via LM Studio with the llm-lmstudio plugin
Comparison: Claude Opus 4.7 with thinking_level: max

Results

On "flamingo on unicycle," Qwen added creative SVG comments like  and richer character details, while Opus produced a "competent if slightly dull vector illustration" lacking visual personality. Simon is careful to bound the claim: local Qwen is not beating Opus at general tasks — just at this specific aesthetic SVG task.

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7.

Tools: Qwen3.6-35B-A3B, Unsloth GGUF, LM Studio, llm-lmstudio plugin, Claude Opus 4.7.

AI Tools Developer Tools

Better Stack

Better Stack: Ultraplan vs Superpowers planning shootout

Better Stack pitted Anthropic's cloud-based /ultraplan against the local Superpowers skill on a real release-pipeline feature. Superpowers asked double the clarifying questions (6 vs 3) and produced a plan 4x longer (833 vs 195 lines) with test-driven task breakdowns — but Ultraplan wins for async workflows: kick off on a laptop, continue on a phone ^{[10]Better Stack — Claude Ultraplan vs Superpowers}.

Ultraplan (cloud)

The /ultraplan slash command spins up a cloud container, clones the GitHub repo, and produces a plan in ~2-3 minutes — suggesting sub-agent parallelism ^{[10]Better Stack — Ultraplan vs Superpowers}. It consumed 33% of the creator's monthly Pro token quota for one plan and initially cloned the wrong repo (context issue; corrected after a comment).

Superpowers (local)

Superpowers runs entirely locally, asks 6 clarifying questions (vs 3), and has two phases: a design plan and an implementation plan. Output was 833 lines with test cases and test-first task execution, and used ~75k tokens with prompt caching. The creator's preference for 90% of local work, because skills, MCP tools, and the whole dev loop live there.

When Ultraplan wins

At ~06:09: device-agnostic async — start a feature on a laptop, close it, continue on a phone or tablet. Requires repo on GitHub and a Pro or Max sub. The creator forgot to install the Claude GitHub app on the demo repo, which would have auto-created PRs.

If I do choose to work locally, which I'm doing 90% of the time, then I'll probably use Superpowers because all my code is there locally, my skills, my MCP tools, everything.

Tools: Claude Code, Ultraplan, Superpowers, Claude Code for Mac, GitHub Actions.

Developer Tools

Simon Willison's Weblog Simon Willison's Weblog

Simon Willison's dev tools: llm-anthropic 0.25 + Datasette artifact

Simon shipped llm-anthropic 0.25, adding support for claude-opus-4.7 (including thinking_effort: xhigh) and two new options: thinking_display (shows extended reasoning output as JSON) and thinking_adaptive. Default max_tokens raised to the per-model maximum; an obsolete beta header removed ^{[11]Simon Willison — llm-anthropic 0.25}. Separately, he built a Claude Artifact to edit the datasette.io news YAML with live preview and validation, flagging both markdown syntax and YAML formatting errors before they reach the live site ^{[12]Simon Willison — datasette.io news preview}.

llm-anthropic 0.25

New model: claude-opus-4.7, supports thinking_effort: xhigh
thinking_display: boolean for displaying extended reasoning (JSON output only)
thinking_adaptive: boolean for adaptive thinking behavior
Default max_tokens raised to per-model maximum
Obsolete beta header removed for older models

Datasette news preview artifact

The datasette.io site maintains its news section as a YAML file in a GitHub repo, which Simon found error-prone to edit directly. He used Claude Artifacts' ability to analyze GitHub repos in conversation to build a custom preview UI that accepts YAML input and renders how entries will appear on the homepage, plus a validation badge surfacing markdown/YAML errors before commit.

Tools: LLM CLI, llm-anthropic plugin, Claude Artifacts, claude.ai, GitHub, Datasette.

AI Models Industry

OpenAI News

OpenAI launches GPT-Rosalind for life sciences research

OpenAI released GPT-Rosalind, its first domain-specialized life sciences model — targeting biochemistry, genomics, and drug discovery workflows. Named after Rosalind Franklin, it ranked above the 95th percentile of human experts on prediction tasks, hit the 84th percentile for sequence generation, and posted the highest published score on the BixBench bioinformatics benchmark ^{[13]OpenAI — Introducing GPT-Rosalind for life sciences research}. Launch partners include Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific, via a limited "trusted access program."

Capabilities

GPT-Rosalind supports evidence synthesis, hypothesis generation, experimental planning, and multi-step research. It can query specialized databases, parse literature, call computational tools, and propose experimental pathways. A companion Life Sciences Codex plugin connects the model to 50+ scientific tools and data sources.

Rollout

Initially a research preview for qualified enterprise customers — not general availability. Positions OpenAI directly against Google's life-sciences push; no public pricing was announced.

OpenAI framed the model as a tool to help scientists move faster through analytically demanding work, not as a replacement for researchers.

Tools: GPT-Rosalind, Codex, BixBench, Life Sciences Research Plugin.

Podcast AI Models

OpenAI

OpenAI Podcast Episode 16: Building AI for Life Sciences

OpenAI's Joy Jiao (research lead) and Yun Wang (product lead) walk through the new biochemistry-focused model series that anchors GPT-Rosalind, a research plugin with 50+ templatized skills, the Ginkgo Bioworks collaboration where GPT-5 designed a biology experiment that produced a non-zero amount of protein on its first try, biosecurity safeguards, and a 10-year vision of autonomous research institutes ^{[14]OpenAI Podcast — Ep 16: Building AI for Life Sciences}.

Life sciences model series & research plugin

At ~01:00, Joy and Yun describe a biochemistry-focused model series anchored on complex research workflows. Starts with genomics and protein understanding, focused on early discovery where more thinking time can break research bottlenecks. Ships across ChatGPT (literature synthesis) and Codex (long-trajectory agentic work). The life sciences research plugin exposes 50+ "templatized repeatable workflows" — cross-evidence match across papers, pathway analysis — as one-click deploys on top of the base models.

One of the taglines was to scale test-time compute to cure all disease. So that is like our team tagline.

Ginkgo Bioworks: GPT-5 designs working biology

At ~05:06: when GPT-5 finished training, the team wasn't sure it could do any biology (training data was mostly math and CS). The Ginkgo collaboration (July 2025) tested whether models could design experiments that actually produce the desired product. First set of designs came back with non-zero protein production — surprising even the team.

The future that me and Joy see is that it's no longer human bottlenecks but rather maybe compute bottlenecks.

Biosecurity and differentiated access

At ~09:10: bio is one of the most severe rising-capability risks. Precursor steps to dangerous pathogens look benign and similar to legitimate research ("help me clone a gene" could be GFP or a toxin). Current stance is risk-averse self-refusal, frustrating professional scientists. Roadmap: differentiated access — verify users at legitimate research institutions or pharmas where reagents are tracked, then unlock fuller capability.

The safest model here would be a model that had no capability — it's very safe, but it's not very good.

Scaling reasoning and the 10-year vision

At ~20:14, Joy frames two scaling axes: parameter growth (GPT-2 → GPT-3 produced emergent properties) and test-time compute on reasoning models. Models can now "think for days or effectively forever" about a problem — reframing data centers as complex reasoning infrastructure (hence Stargate). At ~24:16: she wants to see AI design a new drug or cure a disease within a few years. Nearer-term wins: drug repurposing via mechanistic understanding and personalized medicine via ASO/RNA design.

You have these autonomous labs — mostly robots, all hooked up to AI — autonomous research institutes that are constantly running and curing human disease.

Tools: GPT-Rosalind, GPT-5, Codex, Ginkgo Bioworks robotic lab, Life Sciences Research Plugin, single-cell RNA-seq (virtual cell), antibody binding evals, Stargate.

Industry

OpenAI News

OpenAI ships GPT-5.4-Cyber and scales Trusted Access for Cyber

OpenAI is scaling Trusted Access for Cyber (TAC) — originally launched February 2026 — to thousands of verified individual defenders and hundreds of teams. The centerpiece is GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 with lower refusal boundaries for legitimate defensive work, supporting binary reverse engineering for malware analysis and vulnerability detection ^{[15]OpenAI — Accelerating the cyber defense ecosystem}. OpenAI's Codex Security agent has contributed to 3,000+ critical and high-severity vulnerability fixes, and Codex for Open Source has scanned 1,000+ projects.

GPT-5.4-Cyber and TAC

Fine-tuned for defensive security with relaxed refusals for legitimate use cases
Binary reverse engineering for malware analysis and vulnerability detection
Safeguards against jailbreaks and adversarial prompt injections
Rolled out iteratively to vetted security vendors, orgs, researchers
Individual access at chatgpt.com/cyber; enterprise direct with OpenAI

Codex Security at ecosystem scale

3,000+ critical and high-severity vulnerabilities fixed to date
Codex for Open Source: free scans for 1,000+ open-source projects
$10M in API credits committed to TAC participants
$1M Cybersecurity Grant Program for AI-defense research
Three-pillar framework: democratized access, iterative deployment, ecosystem resilience

Tools: GPT-5.4-Cyber, Codex Security, Codex for Open Source, ChatGPT (chatgpt.com/cyber), Cybersecurity Grant Program.

Podcast AI Future

Y Combinator

YC Interviews Quan Vuong: The GPT Moment for Robotics Is Here

Physical Intelligence co-founder Quan Vuong lays out PI's thesis: one cross-embodiment foundation model trained on ~10 robot platforms beats specialists by 50%^{[16]YC — The GPT Moment for Robotics Is Here}. Nearly all of PI's complex demos — coffee-making, laundry folding, mobile nav — run inference in a real remote data center via "real-time chunking." And he gives a full playbook for the "Cambrian explosion" of vertical robotics startups.

Cross-embodiment as the foundation

At ~05:04, Vuong walks through the Open Cross-Embodiment (RT-X) result: absorb data from ~10 heterogeneous platforms into a single high-capacity model, and the generalist beats every specialist by 50%. The model learns "how to control any robot," not one specific platform — which sidesteps the drift problem where single-platform data goes stale every 3 months.

There is this joke in robotic grad school that if you want to add two years to your PhD, just work on a new robot platform.

Cloud inference for real-time robot control

At ~23:13, the hot take: almost all of PI's robot evaluations — including the really complex demos — run inference on a model hosted in a real remote data center, not on-device. "Real-time chunking" queries the next chunk 50ms before the current ~100ms queue ends so chunks stitch together smoothly. Vuong says he has never physically seen the Weave or Ultra robots and doesn't know how their data is collected — an intentional decoupling.

Almost all of the robot evaluation that we run at PI today, including the really complicated demo… the model is actually hosted in the cloud. This is not a server in the office. It's a real cloud.

Emergent zero-shot tasks and real deployments

At ~13:08: tasks that last year needed hundreds of hours of data collection are now doable zero-shot. Weave (YC, ex-Apple founders) folds diverse laundry in a real laundromat. Ultra (YC) runs in a real e-commerce warehouse packing real Amazon orders into narrow soft pouches — 100 minutes at 4x speed, minimal intervention, with lighting changes historically brutal for robotics vision.

It still blows my mind to see a robot actually folding laundry because until ChatGPT I didn't know if this would exist even in my entire lifetime.

The Cambrian-explosion playbook

At ~29:18, Vuong names the recipe for vertical robotics startups: (1) understand the existing workflow; (2) use cheap hardware — reactive models compensate for imprecise motion; (3) build data-collection + eval infra; (4) deploy mixed-autonomy with humans correcting failures; (5) reach break-even per robot before scaling.

The equation for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.

Automated research scientist + ML ops

At ~43:37: Vuong's dream side project is an automated robotics research scientist. PI already runs a Claude-based "pre-training on-call" agent babysitting training runs — delivering ~50% improvement in compute utilization. "An embarrassingly large amount of money on API queries."

Tools: Pi-0 / Pi-0.5 (open-sourced), Weave, Ultra, RT-X / Open Cross-Embodiment, PaLM-E, SayCan, real-time chunking, cloud inference API, Claude (pre-training on-call), Claude skills, Open Claw, Obsidian + markdown for agent orchestration, MCP.

Podcast Hot Take

Lenny's Podcast

Lenny's Podcast: Hire barrels, not ammunition

The episode's framing: most post-fundraise hiring surges fail because companies add ammunition (people who need direction) without adding barrels (people who can independently drive an initiative from idea to done). Hiring more without expanding barrels just stacks people behind the same bottlenecks ^{[17]Lenny's Podcast — Hire barrels, not ammunition}.

The definition

Can they take an idea and make it happen? One way or the other, they're going to get your company across that hill. That's a barrel.

The failure mode: a CEO hires aggressively, burn rate spikes, output per unit of time barely moves. The real bottleneck is the small count of people who can take an initiative from inception to success. More bodies = more collaboration tax, same barrels.

The number of people that can independently drive an initiative from inception to success is very limited within any company.

If you hire more people without expanding the number of what I call barrels… all you're doing is stacking people behind the same initiatives.

Developer Tools Hot Take

AI Engineer

Mario Zechner at AI Engineer: Building Pi in a World of Slop

Mario Zechner's AIE Miami talk is a three-act tragedy: why he left Claude Code, why he built Pi (a minimal self-modifying coding agent), and why the wider OSS world is drowning in agent-generated slop ^{[18]Mario Zechner at AI Engineer — Building Pi in a World of Slop}. Punchline: "Slow the fuck down. Learn to say no. Fewer features, but the ones that matter."

Act 1 — why he left Claude Code

The harness kept mutating his context behind his back (changing system prompts, injecting "may or may not be relevant" system reminders), offered zero observability, zero model choice, and only shallow hooks spawning a new process per invocation. Alternatives had their own sins — OpenCode prunes tool output past a token threshold ("lobotomizes the model"), injects LSP errors into edit tool results, stores every message as an individual JSON file, and defaults CORS so any website can hit the local server.

My context wasn't my context. Cloud code is the thing that controls my context.

Terminal Bench's December 2025 leaderboard shows the minimalist Terminal harness — which only sends keystrokes to a tmux session — beats native harnesses across model families.

We are in the fuck around and find out phase of coding agents, and their current form is not their final form.

Act 2 — Pi

Four packages (AI abstraction, agent core = while-loop + tool calling, bespoke TUI framework, the coding agent itself). Four tools (read, write, edit, bash). A tiny system prompt that hands the agent Pi's own handcrafted docs plus example extensions. Extensions are hot-reloaded TypeScript modules shipped via npm. YOLO-by-default on security because he thinks per-call approval dialogs aren't real security. User-built extensions include a 5-minute clone of Anthropic's /bashing, Nico's chatroom of agents talking to each other, and Pi playing NES / Doom. Pi placed sixth on Terminal Bench in October. Pi became the agentic core inside OpenCode; his issue tracker filled with LLM-generated garbage, so he built an auto-close PR filter asking humans to write issues "in your human voice, no longer than a screen of text" — clankers don't read comments. Mitchell turned it into vouch.

You don't need 10,000 tokens to tell them they're a coding agent. They know — because they are coding agents.

Act 3 — the polemic

Agents are "compounding boo-boos with serial learning, no bottlenecks, and delayed pain." 90% of training data is "our old garbage," so every local agent decision pulls in over-abstraction, duplication, backward compatibility, and defense-in-depth — "enterprise-grade complexity within two weeks with just two humans and 10 agents." Review agents ("the ouroboros") don't catch it. 1M-token context windows are "a heck," agentic search also fails, so agents patch locally and break globally — and you can't trust your tests because the agent wrote them.

You know what we call a sufficiently detailed spec? It's a program.

The agent patches locally and fucks up globally. If you see this in your codebase, you're fucked.

Prescription: scope tasks tightly so the agent can find everything it needs, modularize, give it an eval function when possible, let it rip on non-critical and boring work — but for anything that matters, hand-write it.

If you do anything important, write it by hand. Friction is the thing that builds the understanding of the system in your head.

Slow the fuck down. Think about what you're building and why. Don't just build because your agent can do it.

Tools: Pi (coding agent harness), Claude Code, OpenCode, AMP, Factory Droid, Terminal Bench, tmux, npm, TypeScript, LSP, vouch.

AI Tools Industry

AI Engineer

Diego Carpentero at AI Engineer: $1 AI Guardrails with ModernBERT

Diego Carpentero argues a fine-tuned ModernBERT classifier is a cheap, self-hostable defensive layer against six LLM attack vectors: direct prompt injection, indirect/context injection, gibberish-suffix alignment breaks, RAG poisoning, MCP tool-description exploits, and agentic attacks. He hits ~85% accuracy at 35–40ms per classification on the InjecGuard dataset, running on commodity hardware for under $1 ^{[19]Diego Carpentero at AI Engineer — $1 AI Guardrails with ModernBERT}.

Six attack surfaces he surveys

Direct prompt injection — Sydney/Bing case; a Stanford student exfiltrating 40+ confidential rules via "ignore previous instructions"
Indirect/context injection — Einstein Wikipedia edit redirecting LLMs to malware; March 2026 sites embedding prompts to overrule AI ad-review
LLM internals attacks — gibberish suffix tokens from Greedy Coordinate Gradient search on open weights, transferring to black-box models
RAG poisoning — PoisonedRAG paper: 5 malicious chunks in an 8M-document DB are enough
MCP tool-description asymmetry — users see a one-liner, the LLM sees full hidden instructions exfiltrating keys/creds
Agentic attacks — Subby AI clicking "support" links; February 2026 NPM supply-chain attack via GitHub issue titles affecting ~4-5k developers

These attacks, they are no longer the exception, they are now the baseline.

Why ModernBERT

Alternating attention — two local 128-token sliding-window layers (64/side) then one global 8192-token layer; ~70% memory reduction with Flash Attention
Unpadding + sequence packing — original BERT's Wikipedia training wasted up to 50% of compute on padding
Deep-and-narrow — base (~150M params, 22 layers / 768 hidden) or large (28 layers / 1024 hidden)
Rotary Positional Encoding (RoPE) with different rotation scales for local vs global attention
Flash Attention exploiting GPU on-chip (30+ TB/s) vs off-chip memory hierarchy

Training stack and results

Dataset: InjecGuard — 75k labeled examples from 20 open sources. Hugging Face Datasets with batched map tokenization, dynamic padding, a classification head on the CLS token (optionally mean pooling for long sequences), BF16 (cut training memory ~40%, enabling batch size 64), and StableAdamW. Switching base → large added nearly 6 accuracy points. Final: ~85% accuracy, 35–40ms latency, self-hosted for under $1. Demo on HuggingFace Space correctly classifies the Sydney prompt, Wikipedia redirect, ad-review override, GCG gibberish suffix, and MCP key-exfiltration as unsafe.

In a knowledge database comprising 8 million documents, poisoning only five chunks was enough to be successful in this attack.

We have to build safety mechanisms that protect machines, humans and society.

Tools: ModernBERT (base / large), InjecGuard dataset, Hugging Face Datasets, Flash Attention, RoPE, BF16, StableAdamW, Hugging Face Spaces, PoisonedRAG (paper), GCG attack.

Industry

AI Engineer AI Engineer

AI Engineer Miami: Livestreams from Day 2 & Keynote

Two multi-hour livestream sessions from AI Engineer Miami on Apr 16. Day 2 features talks from Cerebras, OpenCode, Cursor, Arize AI, and more ^{[20]AI Engineer Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!}. The keynote track headlines OpenCode, Google DeepMind, and OpenAI ^{[21]AIE Miami Keynote & Talks ft. OpenCode, Google DeepMind, OpenAI, and more!}.

Day 2

Multi-speaker livestream featuring Cerebras (AI inference hardware), OpenCode (open-source coding tools), Cursor (AI code editor), and Arize AI (observability/eval). Transcript was not available on fetch, so specific speaker content and timestamps can't be confirmed beyond the title.

Keynote track

Main keynote-and-talks stream featuring OpenCode, Google DeepMind, and OpenAI. Likely covered high-level announcements, research highlights, and product direction from the largest AI labs present. Transcript also unavailable.

Standalone speaker talks (Mario Zechner, Diego Carpentero) are covered in their own topics above.

Tools: OpenCode, Google DeepMind, OpenAI, Cerebras, Cursor, Arize AI.

AI Models AI Future

Two Minute Papers AICodeKing AI Search

Open-weight surge: Gemma 4, SuperGemma-4, and Ernie Image dethrones Z Image

Google DeepMind's Gemma 4 (2B–31B, Apache 2.0) hit 10M downloads in its first week; the 31B dense model beats some models 10–20x larger on benchmarks, and the 2B runs on a first-gen Nintendo Switch ^{[22]Two Minute Papers — Why DeepMind's New AI Broke The Internet}. A community uncensored fine-tune (SuperGemma-4 26B, Jun Song on Hugging Face) runs at 46.2 tok/s on Apple Silicon via MLX and plugs straight into Hermes agent and Open Claw via an OpenAI-compatible endpoint ^{[23]AICodeKing — SuperGemma-4 (26B) UNCENSORED + Hermes, Open Claw, OpenCode}. On the image side, Baidu's Ernie Image (and Turbo variant) dethroned Z Image on benchmarks — Q2K GGUFs run in ~3 GB VRAM ^{[24]AI Search — New BEST local AI image generator is here!}.

Gemma 4 (Two Minute Papers)

At ~01:01: 2B runs on a first-gen Nintendo Switch and phones without internet; 31B is a dense (not MoE) model ranking third best open, beating some 10x-larger models and remaining competitive with 20x-larger ones.

Four architectural wins:

Highly curated training data with strict quality filters
Hybrid attention: local sliding window + global attention
Native-resolution image understanding (Gemma 3 squished images into a square; Gemma 4 processes as-is)
Shared KV cache — layers reuse memory from earlier layers instead of recomputing

Context doubled to 256K. License upgraded from restrictive Gemma License to Apache 2.0. At ~07:08, it's pitched as a drop-in for cloud LLMs inside agentic harnesses like Open Claw — "a frontier model just got locked down for a few select clients… that's all right, just plug in Gemma 4 and you're good to go for free."

SuperGemma-4 26B (AICodeKing)

A community uncensored fine-tune of Gemma 4 26B A4B (~25B total, ~3.8B active MoE). Native system prompt + function calling, 256K context. MLX 4-bit V2 release on Hugging Face; creator claims Quick Bench overall 95.8 vs 91.4 and 46.2 tok/s vs 42.5 baseline. Setup: pip install mlx-lm, mlx_lm.server with --port 8080, let MLX auto-detect the bundled chat template (forcing a path "can corrupt responses"). GGUF Q4_K_M (~16.8 GB) available for Windows/Linux. Plugs into Hermes agent or Open Claw through any OpenAI-compatible client path.

Anything that can talk to an OpenAI compatible endpoint can basically use it.

Ernie Image (AI Search)

Baidu's Ernie Image tops published benchmarks over Z Image, Qwen Image, Flux 2 Klein, and edges close to closed Nano Banana 2 on head-to-head prompt tests. Wins: photorealism without the plasticky Flux-era look, complex multi-element prompts, in-image text, infographic layout, manga/comic panels. Weaknesses: human anatomy, highly abstract spatial instructions. Base vs Turbo: Turbo is 3–5x faster with minimal quality sacrifice. Both ~16 GB; full stack with Mistral 3B text encoder + Flux 2 VAE is ~20 GB. Unsloth GGUF quantizations bring VRAM to as low as ~3 GB (Q2K). ComfyUI has built-in Ernie Image workflow templates; with the ComfyUI-GGUF extension (by city96), an 8 GB GPU can run Q6_1 (6.7 GB). CFG=1, steps=8 for Turbo, under 10 seconds per generation.

Ernie looks way more realistic and natural and imperfect.

Tools: Gemma 4 (2B–31B, Apache 2.0), SuperGemma-4 26B Uncensored MLX 4-bit V2 (Jun Song), Hermes agent, Open Claw, MLX-LM server, llama.cpp, LM Studio, Jan, Open Web UI, Ernie Image / Turbo, Unsloth GGUF, ComfyUI, ComfyUI-GGUF (city96), Mistral 3B text encoder, Flux 2 VAE, Nvidia Neotron 3 Super.

AI Tools

Google — Gemini App blog

Gemini app gets "Personal Intelligence" and Nano Banana 2

Google's Gemini app now generates personalized images from simple prompts by pulling context and reference photos directly from the user's Google Photos library — powered by the Nano Banana 2 image model and a new "Personal Intelligence" layer ^{[25]Google — Personalized images in Gemini with Nano Banana 2}. Prompts like "Design my dream house" or "Create a claymation image of me and my family enjoying our favorite activity" pull real context instead of forcing a prompt-engineering step.

How it works

Users connect their Google Photos library via the "+" icon. People and pet labeling from existing Photos organization is used to identify individuals. A Sources button shows which images guided the generated output; users can swap reference photos or request style variations (watercolors, charcoal sketches, oil paintings). Google states the Gemini app does not directly train its models on users' private Google Photos libraries.

Rollout

Rolling out to US Google AI Plus, Pro, and Ultra subscribers.

Tools: Gemini app, Nano Banana 2, Personal Intelligence, Google Photos.

Developer Tools Industry

Ramp Builders

Ramp's unified pipeline for AI token spend: Kafka → ClickHouse

Ramp built a multi-tenant AI usage ingestion pipeline (LiteLLM/OpenRouter webhooks → Kafka → ClickHouse) after a LiteLLM upgrade silently introduced phantom Gemini tokens that inflated costs until token-level auditing caught it. ClickHouse ReplacingMergeTree ordered by (business_id, source, event_id) gives storage-level dedup, costs are stored as Decimal(20,10) to prevent drift, and the system handles tens of thousands of events per minute per customer ^{[26]Ramp Builders — Building a Unified Pipeline for AI Token Spend}.

Architecture

AI gateways (LiteLLM, OpenRouter) → idempotent webhook endpoints → Apache Kafka → ClickHouse → REST analytics API → dashboards. Webhook endpoints:

POST /developer/v1/ai-usage/litellm — LiteLLM Standard Logging format
POST /developer/v1/ai-usage/openrouter — OpenTelemetry OTLP trace with GenAI semantic conventions

ClickHouse uses ReplacingMergeTree ordered by (business_id, source, event_id) with created_at as the version column — replayed events converge to exactly-once without application logic. Decimal(20,10) prevents rounding drift across millions of monthly events. Multi-tenancy: Row-Level Security scoped to business_id. Kafka absorbs traffic spikes without backpressure. Monitoring: Datadog via StatsD.

Without Ramp's internal per-product tagging and token-level visibility this mystery would have remained unsolved, and Ramp would have had to swallow the additional costs.

Common AI cost leaks they found

Phantom reasoning tokens from package upgrades
Oversized models doing simple classification
Missing prompt caching on repeated inputs
Geo pricing premiums (Anthropic's 1.1x US-region multiplier)
Runaway autonomous agent loops
Abandoned dev API keys from completed projects
System prompt bloat accumulating to 8K tokens per request

Structural finding: input tokens run ~10x output tokens in production — prompt efficiency is the primary cost lever. Context: nearly 50% of Ramp customers now pay for at least one AI provider, and average monthly AI token spend across customers grew 13x.

Tools: Apache Kafka, ClickHouse (ReplacingMergeTree), LiteLLM, OpenRouter, OpenTelemetry OTLP, Datadog, StatsD, OAuth2, Row-Level Security, LiteLLM virtual keys, OpenRouter Broadcast.

Industry

The Rundown AI

Allbirds becomes "NewBird AI": from sneakers to GPU-as-a-service

Allbirds ($BIRD) is reinventing itself as NewBird AI, a GPU-as-a-service rental business on long-term contracts. The company sold its sneaker brand to American Exchange Group for $39M in March 2026, then announced a $50M financing deal to buy GPUs in April. The pivot triggered a 600%+ stock surge (~$3 to over $20) despite a $22M market cap — a stark contrast to its $4B IPO-peak valuation in 2021 ^{[27]The Rundown AI — Allbirds ditches sneakers for AI compute}.

The deal

Brand asset sale: $39M to American Exchange Group (March 2026)
GPU financing: $50M deal announced April 2026
Stock surge: 600%+, from ~$3 to over $20
Market cap at close: $22M (vs $4B at 2021 IPO peak)
Shareholder vote next month: strip public-benefit-corporation status, removing the sustainable-footwear mission from the charter

The Rundown frames it as an opportunistic rebrand riding the compute shortage — comparable to the blockchain-era name changes — not a genuine business thesis.

Hot Take AI Future

AI News & Strategy Daily | Nate B Jones

Nate B Jones: your AI is 50x faster, you're only getting 2x

Nate's thesis: AI models run 10–50x faster than humans, but real-world productivity gains are capped at 2–3x because every tool, API, and file system was designed for human speed, not agent speed. Making models infinitely faster still yields only ~2–3x. The bottleneck isn't inference — it's the entire human-affordance stack wrapped around the model ^{[28]Nate B Jones — Your AI Is 50x Faster. You're Getting 2x.}.

The bottleneck

At ~03:00: every timeout, rate limit, auth flow, pagination scheme, and startup sequence was calibrated to human perception and hand-speed. Jeff Dean made the same point at GTC — if an agent is 50x faster, milliseconds lost to tool startup, context switches, and paginated APIs dominate cost. NVIDIA's Billy Deli said inference is now 90% of data-center power, heading to 10–20k tokens/sec/user — consumed by agents, not humans.

We spent a trillion dollars on these agents. We want them to think collectively. We got them to do it. We made the sand to think. Now we're bottlenecking them on tool calls that were designed for humans.

Three layers of rebuild

At ~06:06:

Faster tools in faster languages — Rust/Go/Zig migrations; TypeScript 7 rewritten in Go; Lee Robinson built a 38K-line Rust image compressor using only coding agents (the compiler acts as verification)
Agent-native primitives — persistent containers (OpenAI shipped these in Feb so agents install deps once, never restart); branch FS (copy-on-write filesystem with sub-second branch creation); shared KV caches across agents (reducing latency 3–4x vs text-passing)
Entire stack rebuild around agents as the primary consumer — "You are losing ground by standing still because every model improvement shifts the ratio of the model capability against your human effort to contain the model."

MCP has blinded us to where this needs to go — you can take a human-friendly API and stick an MCP over the top and the agent will make do, but that doesn't mean you don't eat wall clock time.

Four human roles that survive

At ~14:10:

AI tool generalist — the spark who knows which AI to use and drives long-running agentic processes
Pipeline/infrastructure engineer — builds and secures agentic infra
Relationship closer — salespeople stay irreplaceable because "people like doing business with people"; agent-run companies will hire humans to close
The grown-up in the room — someone with the maturity to put the brakes on

I think it's a promotion to the hardest and most valuable job in computing.

Tools: MCP, Rust, Go, Zig, TypeScript 7, OpenAI persistent containers, branch FS (copy-on-write), shared KV cache, Salesforce, SAP, SharePoint, Zendesk.

Hot Take

The Pragmatic Engineer AI News & Strategy Daily | Nate B Jones Real Python

AI and the workforce: DHH on peak programmer, the Red Queen memo, maintainability

DHH argues the golden era of "learned guild" programmers has already peaked: companies treat dev as a cost center, and if AI cuts dev headcount 10x they'll simply take the savings ^{[29]The Pragmatic Engineer — DHH: "We've seen peak programmer"}. Nate B Jones revisits Toby Luki's Red Queen memo — "stagnation is slow-motion failure" — as the defining document for 2026 workforce restructuring ^{[30]Nate B Jones — How the Red Queen memo exposed who will actually survive}. Real Python counters that AI code is great for one-off tasks but the trade-offs shift the moment you have to maintain the output ^{[31]Real Python — AI Code Is Great Until You Have to Maintain It}.

DHH: peak programmer

DHH splits software into two camps: unlimited-scope companies (like his own) that absorb productivity gains by building more, and cost-center shops that pocket the savings. Real constraint value shifts to product management — figuring out what to build, who to talk to, where to focus — a role he admits he historically undervalued.

Nate B Jones: Red Queen

The Red Queen memo (Toby Luki, early 2025) forecast role dissolution, junior talent deprioritization, AI fluency as baseline, and dramatic compensation polarization. Nate says all of it is playing out in 2026.

Stagnation is almost certain… if we do nothing. And stagnation is slow-motion failure.

The volume is at 11, and this is happening faster and faster and faster.

Real Python: maintainability is the tell

If it's a one-way thing, great, cuz then you didn't have to write all that code and it's fantastic. But, as soon as you start maintaining these things, the questions change.

The argument: success stories people share (translating Bootstrap 3 → 4) tend to be tasks nobody touches again. The moment ongoing maintenance enters, readability and the ability to reason about the code become critical — and AI output often falls short.

Industry Hot Take

Fireship Low Level Better Stack Tech Brew

Security week: WordPress supply chain, Defender zero-days, SynthID, Recall

Four separate security stories hit on the same day: a WordPress supply-chain attack via 31 Flippa-acquired plugins ^{[32]Fireship — WordPress supply-chain attack via Flippa}, a disgruntled researcher dropping two Windows Defender zero-days after a bug-bounty dispute ^{[33]Low Level — Windows Defender Blue Hammer / Red Sun zero-days}, Google DeepMind's "unhackable" SynthID watermark reverse-engineered via a phase-shift attack ^{[34]Better Stack — SynthID watermark cracked}, and a researcher showing how malware can still steal data from Windows Recall — with Microsoft declining to call it a vulnerability ^{[35]Tech Brew — Is it time to recall Windows 11?}.

WordPress: 31 plugins backdoored via Flippa

At ~01:01: instead of exploiting a vulnerability, the attacker purchased a portfolio of 31 WordPress plugins on Flippa for a mid-six-figure sum, embedded dormant backdoors ~8 months ago, then activated them. Payloads pulled remote code and modified wp-config.php (containing DB creds + security keys). The C2 domain was resolved through an Ethereum smart contract for instant swapability. 96% of WordPress vulnerabilities originate in its plugin system — PHP scripts with full server privileges and no sandboxing.

The attacker didn't exploit a vulnerability. Instead, they legitimately acquired and took control of a portfolio of plugins by simply purchasing them for money from the original developer on Flippa.

At ~03:04: Cloudflare shipped Mdash, an MIT-licensed WordPress-compatible alternative built on Astro. Each plugin runs in its own sandboxed Cloudflare Worker with capability-based permissions declared in a manifest — directly addressing the full-privilege plugin problem.

Windows Defender: Blue Hammer + Red Sun

A researcher calling themselves "Nightmare Eclipse" released working PoC code for two Windows Defender zero-days — Blue Hammer and Red Sun — after claiming MSRC violated a bug bounty agreement and left them homeless. Both abuse TOCTOU race conditions. Blue Hammer blocks Defender's cloud-file VDM signature update with a fake stub, replaces the VDM file with a symlink pointing to the SAM hive, then lets Defender (running as SYSTEM) snapshot the symlink into a VSS file — the attacker extracts SAM from the snapshot and pass-the-hashes to admin. Red Sun exploits Defender's behavior of rewriting cloud-tagged malicious files before quarantining; target swap + content swap causes Defender to write arbitrary code into System32 and install it as a service. Rust wouldn't have fixed either — these are logic/concurrency issues, not memory safety.

I was not bluffing Microsoft and I'll do it again.

SynthID watermark cracked

Developer Alouch Denny released "reverse synth ID." By analyzing blank Gemini outputs (Gemini White and Black), they isolated the exact Fourier-transform coordinates where SynthID's spread-spectrum signal lives — and discovered the signal is unequal across channels (green 1.0, red 0.85, blue 0.7) and the phase template is near-identical across all images. A "phase-shift attack" targets specific frequency bins and shifts the watermark's phase just enough to destroy coherence — dropping Google's detector confidence 90%+ while preserving 43 dB PSNR (image looks perfect).

The moment you can see the signal in the math, you can basically delete it.

Windows Recall: not a vulnerability, per Microsoft

Windows Recall, which launched publicly in April 2025 after delays, periodically screenshots user activity and makes it AI-searchable. A researcher showed malware can trigger a legitimate Windows security prompt, wait for the user to authenticate, then intercept the vault's contents as they transfer to an unprotected display process. Microsoft's stance: inter-process communication is intentional system behavior, not a vulnerability. Security experts dispute that, noting the attack vector differs from typical short-lived credential exchanges. Windows 11 has faced sustained backlash since 2021 over forced Copilot integrations, Start menu ads, and undisableable AI features — earning the "Microslop" nickname.

Tools: WordPress, Flippa, Mdash (Cloudflare), Astro, Cloudflare Workers, Blue Hammer / Red Sun PoCs, Windows Defender, VDM signature files, SAM hive, VSS, NTLM pass-the-hash, MSRC, SynthID, reverse synth ID, Windows Recall.

Developer Tools Industry

Real Python Better Stack Data Science Weekly

Dev & research notes: ChromaDB, Docker, Data Science Weekly #647

Lighter dev-tooling and research grab-bag: Real Python's ChromaDB vector-math primer (magnitude, dot product, cosine similarity with spaCy embeddings) ^{[36]Real Python — Vector Databases and Embeddings With ChromaDB}, Better Stack's three Docker-build speedups that took a 10-minute build to under 3 minutes ^{[37]Better Stack — Your Docker Builds Are Slow (and it's your fault)}, and Data Science Weekly #647's roundup of geospatial CLIP, Sebastian Raschka on coding-agent components, and Nathan Benaich's April 2026 State of AI ^{[38]Data Science Weekly — Issue 647}.

Real Python: ChromaDB vector foundations

At ~00:00: vectors as ordered numerical arrays; three operations (Euclidean norm, dot product, cosine similarity) demonstrated in pure Python then NumPy (np.linalg.norm, np.dot, @). At ~12:08: spaCy's en_core_web_lg (300K+ embeddings, 300 dims); practical comparisons reveal "cat" vs "dog" = 0.80, "tasty" vs "delicious" = 0.92, "cat" vs "spaceship" = 0.13, "delicious" vs "spaceship" = 0.04.

Cosine similarity is the normalized dot product of two vectors. It isn't influenced by their scale, only their direction.

Better Stack: Docker builds

Three practical fixes: (1) copy package files and install deps before copying source, to preserve the dep-install layer cache; (2) add a .dockerignore (author cut build context from 500 MB to 20 MB); (3) use BuildKit cache mounts (--mount=type=cache) — author's install step dropped from 3 min to 8 seconds.

Put this all together and your builds can drop from like 10 minutes or so to under 3 minutes. Same code, no new tools, just fixing what most people overlook.

Data Science Weekly #647

Links worth pulling:

UniGeoCLIP — first contrastive framework aligning satellite imagery, street photos, elevation maps, text, and GPS into a unified embedding space
Components of A Coding Agent — Sebastian Raschka on the six core building blocks
State of AI: April 2026 — Nathan Benaich's roundup
Agentic Data Science Done Right — Decision Lab open-sourced for better analytical judgment
Guest Lecture by Eric Wallace of OpenAI on Hardening LLMs
Why is Everyone's Robot Folding Clothes? — matches PI's laundry-folding demo from this briefing
Raft is So Fetch — Raft consensus explained via Mean Girls

Tools: ChromaDB, NumPy, spaCy (en_core_web_lg), Docker, BuildKit, .dockerignore, multi-stage builds.